Назад към всички

afrexai-incident-response

// Structured incident response for business and IT teams. Guides you through detection, triage, containment, resolution, and post-mortem — with auto-generated timelines and action items.

$ git log --oneline --stat
stars:1,933
forks:367
updated:March 4, 2026
SKILL.mdreadonly

Incident Response Playbook

Structured incident response for business and IT teams. Guides you through detection, triage, containment, resolution, and post-mortem — with auto-generated timelines and action items.

What It Does

When triggered with an incident description, this skill:

  1. Classifies severity (P1-P4) based on impact and urgency
  2. Generates a response checklist tailored to incident type (outage, data breach, security event, service degradation, vendor failure)
  3. Builds a communication plan — who to notify, when, what channels
  4. Creates a real-time timeline as you log updates
  5. Produces a post-mortem template with root cause analysis and prevention steps

Usage

Tell your agent about an incident:

"Production API is returning 500 errors for 20% of requests. Started 10 minutes ago."

Or trigger proactively:

"Create an incident response plan for a potential data breach scenario"

Incident Types Covered

  • Service outages — full or partial downtime
  • Security incidents — breaches, unauthorized access, phishing
  • Data incidents — corruption, loss, privacy violations
  • Vendor failures — third-party SLA breaches
  • Performance degradation — latency spikes, capacity issues

Severity Matrix

LevelImpactResponse TimeEscalation
P1 - CriticalBusiness stoppedImmediateExecutive + all hands
P2 - HighMajor feature down< 30 minEngineering lead + PM
P3 - MediumDegraded experience< 2 hoursOn-call team
P4 - LowMinor issueNext business dayTicket queue

Response Framework

1. Detection & Triage (First 5 minutes)

  • Confirm the incident is real (not a false alarm)
  • Classify severity using the matrix above
  • Assign incident commander
  • Open a dedicated communication channel

2. Containment (First 30 minutes)

  • Identify blast radius — what's affected?
  • Apply immediate mitigation (rollback, feature flag, scaling)
  • Communicate status to stakeholders

3. Resolution

  • Root cause investigation
  • Implement fix with verification
  • Monitor for recurrence
  • Update all stakeholders

4. Post-Mortem (Within 48 hours)

  • Timeline of events
  • Root cause analysis (5 Whys)
  • What went well / what didn't
  • Action items with owners and deadlines
  • Process improvements

Integration

Works with any monitoring stack. Feed alerts from PagerDuty, Datadog, Grafana, or manual reports.

Pro Tip

Pair this with a full AI Operations Context Pack for your industry. Pre-built incident taxonomies, compliance-aware escalation paths, and automated stakeholder templates.

Browse packs: https://afrexai-cto.github.io/context-packs/

Free tools: