AI Incident Response
AI incidents often look like product bugs until you inspect the prompt chain, retrieved context, tool calls, and logs. Response teams need playbooks that preserve volatile AI evidence before prompts, indexes, and model versions change.
Preserve before tuning
V2 Attack Flow Diagram
AI Incident Timeline
Preserve volatile AI artifacts before prompt, model, retrieval, or policy changes erase the trail.
Trigger
Prompt injection, poisoned retrieval, unsafe agent action, gateway abuse, or log exposure.
Freeze State
Model, prompt, retrieval config, tool schema, guardrail, and routing versions.
Trace Impact
Affected users, chunks, outputs, tool calls, side effects, and exposed data classes.
Retest Fix
Contain, remediate, convert trigger into eval, and monitor for drift.
Common AI Incident Types
Prompt Injection
Untrusted content changes model behavior, leaks data, alters output, or triggers unintended tools.
Poisoned Retrieval
A document, ticket, page, or connector source contaminates RAG answers or citations.
Unsafe Agent Action
An agent sends, edits, deletes, installs, executes, or browses beyond intended authority.
Gateway Abuse
Attackers abuse model APIs for cost, extraction, scraping, policy bypass, or tenant boundary probing.
Prompt Log Leak
Prompt, completion, file, or embedding logs expose secrets, customer data, or confidential content.
Supply-Chain Compromise
Model files, adapters, notebooks, datasets, or agent instruction files introduce malicious behavior.
First-Hour Checklist
- 1. Freeze versions: model, prompt, retrieval config, tool schema, guardrail policy, and gateway routing.
- 2. Preserve traces: request IDs, prompts, retrieved chunks, tool calls, approvals, logs, screenshots, and source documents.
- 3. Contain safely: disable risky tools, narrow connectors, revoke exposed tokens, pause provider fallback, and restrict affected tenants.
- 4. Classify data: identify whether prompts, completions, embeddings, files, logs, or actions exposed regulated or confidential data.
- 5. Build retest: convert the incident trigger into an eval before releasing prompt, policy, retrieval, or permission changes.
Evidence Sources
Gateway
Request IDs, API key, tenant, model, prompt version, route, latency, tokens, refusal and safety metadata.
Retrieval
Chunk IDs, scores, ACL filters, source documents, connector sync time, deleted or stale index state.
Agent
Plan, memory, tool call, arguments, approval, output, filesystem/network policy, and denied actions.
Product
User role, session, rendered output, downstream automation, ticket/email/PR side effects, and customer reports.
Close the loop
Closeout & Response
Operator Playbook
Respond to AI-specific incidents by preserving prompts, retrieval, tool actions, model metadata, and downstream effects before tuning away evidence.
Offensive Focus
- Reconstruct attacker timeline across user input, retrieved content, model decisions, tools, logs, and external side effects.
- Preserve canaries, chunk IDs, prompts, completions, tool arguments, model versions, and approval events.
- Convert incidents into hardened controls and regression tests.
Evidence To Capture
- Written scope and allowed test classes
- Timestamped prompts, retrieved context, tool calls, and response artifacts
- Request IDs, model/provider/version, policy decisions, and tenant or user role
- Screenshots or exported logs that reproduce the finding without exposing client secrets
Offensive Test Cases
AI incident timeline reconstruction
- Objective
- Rebuild a prompt-injection, retrieval poisoning, or unsafe-agent incident from logs and artifacts.
- Authorized setup
- Use tabletop data or a staged incident in a lab environment.
- Evidence
- Timeline, request IDs, chunks, prompts, outputs, tool calls, affected users, and containment actions.
Containment regression test
- Objective
- Verify that containment and remediation prevent the same abuse path from recurring.
- Authorized setup
- Use sanitized incident inputs and staging controls.
- Evidence
- Original failure, fix, regression test, pass result, and monitoring update.
Common Findings
- Teams tune prompts or delete documents before preserving forensic context.
- Model gateway logs omit prompt versions, retrieval chunks, or tool arguments.
- Incident response playbooks do not cover AI-specific artifacts and owners.
Lab Ideas
- Run a tabletop for poisoned retrieval in an internal knowledge bot.
- Create a first-hour evidence checklist for an unsafe agent action.
- Convert a staged AI incident into two regression tests.