The engineer gets the postmortem draft in minutes, edits it — doesn't write from scratch. Blameless format encoded in the prompt.
What it does
What automation does
The Grow2.ai AI agent creates a draft postmortem document for a completed incident. After the incident is closed, the agent collects context from three sources and produces a structured draft ready for editing by an engineer.
Sources the agent reads from
- Slack incident thread — team messages, decisions, screenshots, links to dashboards, participant reactions.
- Observability system — metrics, alerts, trace events, logs within the incident window.
- Issue tracker — related tickets, pull requests, deploy records.
What the agent generates in the draft
The agent generates a postmortem in a standard blameless structure:
- incident summary (2-3 sentences),
- timeline with event timestamps,
- impact (affected users, downtime, business effect),
- root cause hypothesis (preliminary, requires verification),
- contributing factors,
- what worked well in the response,
- lessons learned,
- action items with draft owners.
The blameless format is encoded in the prompt: the agent describes systemic and process factors rather than blaming specific people. Phrasing: "the alert did not fire due to a threshold error", not "the engineer did not configure the alert".
The draft is a starting point. The engineer corrects the facts, deepens the root cause analysis, and refines owners and action item dates. The agent handles the mechanical work: collecting artifacts, building the timeline, and providing an initial description of events.
What automation does NOT do
The agent does not conduct root cause analysis independently — it only formulates a hypothesis based on explicit signals from logs and messages. True RCA remains with the engineer: code analysis, problem reproduction, and hypothesis validation require engineering judgment, not text extraction. The agent does not decide on action item priority, does not assign final owners, and does not close incidents in compliance systems. It prepares a draft that a human reviews before publication.
The agent also does not calculate the financial impact of an incident or determine SLA/SLO violations with the accuracy required for external client reports. It can flag a threshold breach in the draft, but validation and attribution remain with the relevant role.
Typical configuration options
Solo team / startup 1-5 people. One prompt template, connected to Slack and the team's single observability tool. The draft is written to the documentation system chosen by the team. The engineer edits before distribution. Focus — setup speed and minimal configuration. Suitable for teams that previously did not write postmortems at all due to lack of time. The agent is triggered manually via a Slack command after the incident is closed. Initial runs — to check draft quality against past incidents. The result — a first habit of documenting incidents, even if imperfectly.
SMB SaaS 6-30 people. Two to three templates: separate ones for P1/P2 incidents and security incidents. Integration with the issue tracker, deploy history, and main monitoring stack. The agent is triggered automatically when an incident is closed. The draft goes to the documentation system and simultaneously to the team's Slack channel for review. Role-based access: who can trigger it, who is required to review. Setup — approximately one week. Suitable for teams with frequent incidents and postmortem discipline requirements.
Enterprise 30+ engineers. Multi-agent setup: one agent collects the timeline, a second performs preliminary root cause analysis, a third generates action items with owners from the team directory. Integration with internal SSO, audit logs, and compliance systems. The draft goes through a review chain: SRE lead → Engineering Manager → Incident Commander. The history of all postmortems is indexed for searching similar incidents. Setup takes longer than the base configuration — accounting for security review and multi-agent architecture. Suitable for companies with a formal incident response process.
How it works
How it works
The automation is built on an agentic architecture: one or more Grow2.ai AI agents read data sources, apply a prompt template with blameless rules, and produce structured markdown. Below is the sequence of steps from incident closure to a ready draft, and how the agent handles different types of incoming data.
Step-by-step process
- Trigger. An engineer closes the incident in the incident management system or manually marks the Slack thread with a special command. The trigger is configured to fit the team's process — automatic on closure, semi-automatic with confirmation, or fully manual.
- Context collection. The agent reads the entire Slack thread: messages, timestamps, reactions, links, forwarded messages. From the observability system it pulls metrics and alerts for the incident window — from the first signal to the end of incident response. From the issue tracker — related tickets, pull requests, deploy records, and previously discussed issues.
- Normalization. The agent builds a timeline from multiple sources: the alert fired at 14:23, the team responded at 14:27, the deploy was rolled back at 14:35. Events are arranged into a single chronology with the source of each fact indicated — so the engineer understands during review where the data came from.
- Applying the prompt template. Blameless rules and the postmortem structure are embedded in the system prompt. The agent generates a draft following this structure, filling it with facts from the collected context. The prompt includes rules about what NOT to write — names in accusatory phrasing, unproven causes, emotional descriptions.
- Saving the draft. The result is saved to the team's documentation system. The link is posted to the Slack channel to notify those who need to do a review.
- Review and editing. The engineer opens the draft, corrects the root cause, refines action items, adds owners and dates. Finalizes the document and publishes it to the team channel or to external stakeholders.
How the agent handles different types of data
Slack messages are a conversational stream with jokes, off-topic content, and links. The agent extracts only factual events: "deploy rolled back", "error in the log aggregator", "latency alert". Off-topic content is ignored. Team context — who did what, at what moment — goes into the timeline; casual remarks do not. Message reactions are used as an importance signal: a message with ten "+1" reactions is more likely to describe a key decision.
Observability data is structured. The agent reads metric names, their values, alert thresholds, and trace events. It forms phrases like "p99 latency exceeded the threshold at 14:15, returned to normal at 14:38". Charts and dashboards are not included in the draft — only conclusions about metric behavior. This keeps the document readable and does not overload it with technical details.
The issue tracker is semi-structured. The agent links tickets by timestamp and mentioned services. If there was a deploy via a specific pull request during the incident period — the agent adds it to the timeline with a link to the ticket and commits. Related bugs and previously discussed issues go into the contributing factors section.
Alternative approaches
Below is a qualitative comparison of three approaches to writing a postmortem.
Criterion | Manual approach | No-code workflow | Grow2.ai AI agent |
|---|---|---|---|
Time to draft | Hours | Tens of minutes | Minutes |
Timeline completeness | Depends on memory | Formal template | Automatically from sources |
Extraction from Slack | Manual copy-paste | Template export | Semantic event extraction |
Blameless phrasing | Depends on culture | Template prompts | Encoded in prompt |
Structure flexibility | Full | Limited by template | Configurable in prompt |
Team training | Required | Required | Minimal |
Maintenance | Not required | Template configuration | Updating prompt and integrations |
Risk of inaccurate facts | Depends on the engineer | Low | Medium (review required) |
The manual approach delivers maximum quality if the engineer has time and good memory for the details of the incident. In practice, after a nighttime incident the draft is pushed to tomorrow, then to Monday, then never written at all. Knowledge stays in the team's heads.
A no-code workflow via Zapier or a workflow engine fits tightly structured processes: a form is filled in, data is mapped to a template. But a postmortem is not a form. A live Slack thread with context, logs, decisions, and emotions does not fit into fields without loss of meaning.
The AI agent bridges the gap between "manual, but rarely done" and "templated, but shallow". The agent reads unstructured data semantically, not by keys, and produces a draft that the engineer edits in minutes instead of hours of manual gathering and writing prose. The mechanical part of the work is delegated to automation, the analytical part stays with the human.
Security and compliance
Incident data is sensitive: links to internal services, customer names, vulnerability details, infrastructure technical parameters. The Grow2.ai agent framework supports on-premise deployment or self-hosted LLM for teams with compliance requirements. For cloud deployment, data is processed in an isolated context, is not used for model training, and is stored according to the team's data retention policy.
Role-based access separates permissions: who can run the agent, who sees the draft, who has the right to publish the final document. The audit log records what data the agent read, which prompt was applied, and who edited what in the result. For security incidents, a separate prompt template is recommended with minimization of sensitive details in the draft — usernames, exploit details, and internal identifiers are replaced with placeholders.
Prerequisites
What you need before implementation
For automation to work, the team must already have basic practices and tools in place. The absence of one or two elements does not block the launch, but makes the draft less complete.
Required minimum
- A centralized incident channel in Slack (or equivalent). If incidents are discussed across various private chats and DMs, the agent has nothing to read. A practice of "incident → dedicated thread or channel" is needed.
- Observability tool with an API. Any monitoring system with access to metrics and alerts via API. Without observability, the agent will not be able to compile an event timeline.
- Issue tracker. A system where bugs, tasks, and deploys are logged. Provides context for related tickets.
- A place to store postmortems. Notion, an internal wiki, or another documentation system. Where the agent will write the draft.
- A basic blameless-postmortem culture. If the team historically looks for someone to blame, automation will not fix the culture. The agent amplifies existing practice, rather than creating it from scratch.
Desirable
A formal incident response process with severity levels (P1/P2/P3), an escalation procedure, and an Incident Commander role. This simplifies agent configuration and makes the draft consistent across incidents.
Having deploy tracking: the agent uses release history to establish the link "incident occurred X minutes after deploy Y". Without this, the link is built on timestamp alone, which reduces attribution accuracy.
A "reviewer engineer" role on rotation: a person who checks the draft before final publication. Not necessarily dedicated — can be a rotation among senior engineers.
Possible pitfalls
- A scattered Slack thread. If the team discusses an incident in three places simultaneously — the agent will collect only one stream. Solution: an agreement of "one incident — one thread", plus a practice of cross-linking between discussion locations.
- Noise in observability. Hundreds of alerts from flapping metrics turn the timeline into a mess. Filtering is needed: the agent reads only severity-critical signals and those related to affected services. Filters are configured in the prompt.
- Expecting a full RCA from the agent. A draft is a raw factual outline, not a ready-made root cause analysis. Teams that publish the draft without an engineering review get shallow postmortems and lose trust in the document.
- Neglecting prompt-tuning. The default template works, but not perfectly. Teams that do not adapt the prompt to their context (their services, their severity format, their postmortem audience) get a generic draft instead of a relevant one.
- Absence of a review process. If the draft is published immediately without review — agent errors (incorrect attribution, wrong timestamp, fabricated detail) end up in the document. A rule is needed: draft ≠ final postmortem until edited by an engineer.
Pain points
- Loss of meeting information
- Time on Manual Reports
- Knowledge in heads, not in documents
FAQ
How long does implementation take?
Basic setup takes about a week: connecting Slack, an observability tool and issue tracker, configuring the prompt template, testing on past incidents. For SMB SaaS with a typical stack — roughly a one-week sprint. An enterprise scenario with security review, SSO, and multi-agent architecture takes longer. Timelines vary if the observability stack is non-standard or the team wants a custom postmortem structure.
What if we don't have an observability system?
Without observability, the agent will collect an incomplete timeline — only what was written in Slack. This is a working minimum for early-stage startups. The draft will be less detailed: no metrics, alerts, or latency graphs. The solution is to connect at least basic monitoring. You can run the agent in parallel and gradually expand data sources as observability is introduced.
What are the risks and what can go wrong?
Three typical risks. First — hallucinations: the agent may fabricate a fact if sources are empty. The safeguard is a mandatory engineer review before publish. Second — sensitive data leakage to a cloud LLM. The safeguard is a self-hosted LLM or data masking. Third — quality degradation when the format of Slack messages or the observability schema changes. The safeguard is a regular pilot test of the agent on recent incidents.
Is it suitable for our industry?
Automation is aimed at SaaS, tech, and product teams with an incident response process. It works in fintech, e-commerce, healthtech — anywhere there are production incidents and an observability stack. For non-tech industries, automation applies if there is a digital service with monitoring. The core requirement is not the industry, but having Slack or an equivalent, an observability tool, and a practice of documenting incidents.
Can we use our own prompt template?
Yes. The prompt template is the agent configuration; it can be adapted to the company's format: section structure, tone of voice, severity classifier, list of required fields. Grow2.ai provides a base blameless template as a starting point, and the team refines it to their context. Updating the prompt does not require rewriting code — it is an edit in the configuration.
What about incident data privacy?
Incident data is processed in an isolated context and is not used for model training. For teams with compliance requirements, self-hosted deployment or on-premise LLM is available. The audit log records all agent requests and the applied prompt. For security incidents, a separate template is used that minimizes sensitive details in the draft.
Is a dedicated ML engineer needed for maintenance?
No. After setup, the agent works autonomously: new incident → draft → review. Maintenance means updating the prompt when the postmortem format changes, adding new data sources, adapting to new team tools. Changes take a few hours per month for minor adjustments. A dedicated ML engineer for maintenance is not needed.
What happens if the agent did not find incident data?
If there is no data in the sources (for example, the Slack thread is empty or the observability window does not match) — the agent returns a draft with explicit markers 'data missing'. It does not infer or fabricate facts. The engineer sees the gaps and fills them in manually. This is better than hallucinations: false facts in a postmortem are more dangerous than missing ones.
Want this in your business?
Book a free audit — we'll show how this automation will work for you.