Blameless Postmortem Operations: The Courage to Write the Trigger, Separation from Performance Reviews, and Tracking Action Items
This article was generated by AI. The accuracy of the content is not guaranteed, and we accept no responsibility for any damages resulting from use of this article. By continuing to read, you agree to the Terms of Use.
- Intended readers: Incident-response and operations leads, SREs and EMs who want to institutionalize postmortems, and executives who want to break out of ritualized practice
- Assumed background: STEP 1-A from “Implementation Guide for Organizational Context Supply Capability”
- Reading time: Full read about 18 minutes / skim about 6 minutes
Overview
Blameless postmortems were proposed in 2012 by John Allspaw at Etsy1 and standardized by Google SRE2, and they have become an industry-wide common language. In practice, though, they often degrade into ritual: “we’ll be more careful next time” lines stack up, the postmortem document gets used as evidence in performance reviews, and more than half the action items get abandoned. Many organizations end up with hollow postmortems.
Rather than stopping at template presentation, this article tackles the four implementation issues that trip teams up most often — the courage to write the trigger, separation from the evaluation system, action-item tracking, and staged broadening of disclosure. Operational details build on the original Allspaw / Google SRE sources, Sidney Dekker’s human-factors work, and James Reason’s Swiss Cheese Model.
Postmortem template (recommended agenda)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# Postmortem: <incident name>
## Status
- Owner / Reviewers / Status (draft / review / done)
## Summary
- One-line incident summary
- Detection time, resolution time, duration
## Impact
- Users affected / revenue / SLO violation
- Internal cost (response effort, opportunity cost)
## Timeline (UTC / local)
- Facts only. "I thought," "I felt" go in a separate field
- For each event: who / what they observed / what they did
## Detection
- What first detected the anomaly (alert / user report / chance)
- Lag from detection to recognition
## Response
- Who was on call, how they moved
- First-response options and the reason for the choice
## Contributing factors
- Process: weaknesses in review / deploy / approval
- Organizational: weaknesses in info sharing / role definition
- Technical: weaknesses in dependencies / monitoring / fallback
- Environmental: external factors / unexpected load
## What went well
- Detection / collaboration / recovery steps that worked
## Lessons learned
- No personal attribution. "Be more careful" is forbidden
- Only structural lessons go here
## Action items
| ID | Action | Owner | Due | Type | Status |
|----|--------|-------|-----|------|--------|
| 1 | xxx | @yyy | MM/DD | prevent / mitigate / detect | open |
## Trigger (optional)
- The direct trigger that set off this incident
Issue 1: the courage to write the trigger
Whether you write a “Trigger” field decides the quality of the postmortem. Skip it and your prevention measures abstract away; include it and you risk personal-attack territory.
Why a trigger is necessary
Abstract prevention measures alone (“strengthen review,” “improve monitoring”) cannot design a system in which the same person, in the same conditions, would not do the same thing again. Without a concrete trigger (who, in what situation, set off by what), you can’t identify the holes in your defensive layers (Reason’s Swiss Cheese Model3).
Operating it without it becoming personal attack
Maintain the state in which everyone believes the trigger is being written to design a system that prevents recurrence under the same conditions, not to assign blame. Specifically:
- Make the grammatical subject of the trigger sentence a process or situation, not a person: not “X-san made a mistake” but “the procedure for Y allowed the verification step to be skipped”
- Before writing the trigger, re-affirm in Lessons learned that “we look at structure, not at people”
- Limit disclosure scope in the early stage (next subsection) to protect psychological safety
- Sidney Dekker’s “second story” framing4: always look for the structural factors beneath the surface “human error”
Allspaw’s principle: extract knowledge
The core of Allspaw’s Etsy post1: “If you punish people who make mistakes, the next mistake gets hidden. If you don’t punish them, the knowledge stays inside the organization.” A postmortem is a knowledge-extraction process, not a venue for evaluation judgment.
Issue 2: separation from the evaluation system
Using the postmortem document as “evidence of a mistake” in performance reviews instantly restores the incentive to hide. This is the single biggest cause of postmortem ritualization.
Explicit rules
- Codify in writing that postmortem documents are not used as input to performance reviews
- The reviewable surface is not the contents of the postmortem but process participation: “did the person participate cooperatively and offer structural lessons?”
- Executives and senior managers should read postmortems but visibly not use them in evaluation judgments
- Exception: when legal or regulatory requirements force evaluation use, declare the scope in advance
The Just Culture lineage
This sits in the lineage of Sidney Dekker’s “Just Culture”4, a long-running discussion in aviation and medicine:
- Neither 100% blame-free (no accountability) nor 100% blame-full (relentless attribution)
- Actions that follow the guidelines are not punished
- Distinguish structurally between intentional violations / gross negligence and the rest
One indicator of whether an organization is embodying Just Culture: is the number of postmortems going up or down? Healthy organizations see postmortem counts rise (because hiding decreases).
Issue 3: tracking action items
“We wrote action items, end of story” is the operational reality at many organizations. Postmortems that don’t track completion are the on-ramp to ritualization.
Type labels for strategic balance
Classify action items into three types:
- prevent: remove the cause itself (don’t recur)
- mitigate: shrink the impact when it does happen
- detect: notice it sooner when it happens
Per postmortem, watch the balance among the three. All-prevent is operationally hard to implement, so it ritualizes. All-detect leaves root causes alone.
Monthly tracking of completion rates
- Visualize action-item completion at 30 / 60 / 90 days, monthly
- In organizations where over half are abandoned, the postmortem itself is ritualizing
- If the abandonment reason is “deprioritized,” explicitly close it (with a stated reason). If it’s “we forgot,” the tracking process is broken
Executive review
Once a quarter, hold an executive meeting that reviews action-item completion and the reasons for non-completion. Without it, action items don’t generate ROI.
Issue 4: staged broadening of disclosure
“Full-company publication = transparency” is too simple. Going company-wide from day one chills people, and postmortem counts decrease (i.e., hiding rises).
Recommended staging
1
2
3
4
5
6
7
Phase 1: team-internal only (first 3-6 months)
↓ psychological safety established
Phase 2: extend to adjacent teams
↓ blameless culture takes hold
Phase 3: company-wide (summary version)
↓ industry sharing
Phase 4: external publication (Etsy / GitHub / Cloudflare style)
The condition for advancing to the next Phase is not elapsed time but “blameless is functioning at the previous Phase.” Premature company-wide publication is counterproductive.
Exception: regulated industries
In finance, healthcare, and government-adjacent industries where regulation requires external disclosure, separate internal-only documents from external-summary documents from the start. See the sister article “Discovery Risk vs. Knowledge Accumulation” for details.
Anti-patterns
| Pattern | What happens | Countermeasure |
|---|---|---|
| “We’ll be more careful next time” lines pile up | Zero structural lessons | Require prevent/mitigate/detect labels on action items |
| Trigger field is omitted | Prevention measures abstract away | Discipline of writing the trigger with a process/situation subject |
| Postmortem document used in evaluation | Hiding is back | Explicit written separation from performance review |
| Action items abandoned | Learning doesn’t convert into outcomes | Monthly visualization of 30-day completion rate |
| Company-wide publication from day one | Chilling effect; postmortem count drops | Staged broadening |
| Executives don’t read postmortems | Structural improvement doesn’t connect to executive decisions | Quarterly executive review |
| Number of incidents becomes a KPI | Hiding incentive | Track postmortem count (transparency); incident count is internal-only reference |
Summary
- Allspaw / Google SRE blameless postmortems are the post-2010 industry standard, but ritualization is common
- Courage to write the trigger: not “to assign blame” but “to design a system that prevents the same thing under the same conditions”
- Separation from the evaluation system: codify in writing that postmortem documents are not “evidence of mistake”
- Action-item tracking: prevent/mitigate/detect type labels, monthly completion rates, quarterly executive review
- Staged broadening of disclosure: team → adjacent teams → company-wide → industry
- Making “incident count” a KPI creates a hiding incentive (Goodhart runaway)
- Healthier organizations see more postmortems (less hiding)
Related articles
- Implementation Guide for Organizational Context Supply Capability: From Facing Problems to Repair — Parent article
- Designing Paired Negative: How It Differs from Solo Negative, and Why It Moves Organizations — Higher concept above postmortem culture
- Goodhart’s Law: How Documentation KPIs Hollow Out the Moment They’re Set — Runaway when incident counts become KPIs
- Discovery Risk vs. Knowledge Accumulation: Solutions in Industries Where Legal Stops Documentation — Postmortem disclosure design in regulated industries
References
Blameless PostMortems and a Just Culture — John Allspaw, Etsy Code as Craft (2012-05). The industry-standard proposal. [Reliability: High] ↩︎ ↩︎2
Postmortem Culture: Learning from Failure — Beyer, Jones, Petoff, Murphy (eds.), Site Reliability Engineering, O’Reilly / Google (2016). [Reliability: High] ↩︎
Human Error — James Reason, Cambridge University Press (1990). ISBN: 9780521314190. The original Swiss Cheese Model. [Reliability: High] ↩︎
The Field Guide to Understanding ‘Human Error’ — Sidney Dekker, Ashgate (3rd ed. 2014). ISBN: 9781472439055. The Just Culture lineage. [Reliability: High] ↩︎ ↩︎2