Response

Intermediate

AI Incident Response

AI incidents often look like product bugs until you inspect the prompt chain, retrieved context, tool calls, and logs. Response teams need playbooks that preserve volatile AI evidence before prompts, indexes, and model versions change.

Preserve before tuning

Do not immediately edit prompts, purge indexes, or rotate models without preserving the affected versions, traces, and source content. AI incidents are hard to reconstruct after configuration drift.

V2 Attack Flow Diagram

AI Incident Timeline

Preserve volatile AI artifacts before prompt, model, retrieval, or policy changes erase the trail.

Event01

Trigger

Prompt injection, poisoned retrieval, unsafe agent action, gateway abuse, or log exposure.

Preserve02

Freeze State

Model, prompt, retrieval config, tool schema, guardrail, and routing versions.

Scope03

Trace Impact

Affected users, chunks, outputs, tool calls, side effects, and exposed data classes.

Close04

Retest Fix

Contain, remediate, convert trigger into eval, and monitor for drift.

AI response quality depends on preserving context first and tuning controls second.

Common AI Incident Types

Prompt Injection

Untrusted content changes model behavior, leaks data, alters output, or triggers unintended tools.

Poisoned Retrieval

A document, ticket, page, or connector source contaminates RAG answers or citations.

Unsafe Agent Action

An agent sends, edits, deletes, installs, executes, or browses beyond intended authority.

Gateway Abuse

Attackers abuse model APIs for cost, extraction, scraping, policy bypass, or tenant boundary probing.

Prompt Log Leak

Prompt, completion, file, or embedding logs expose secrets, customer data, or confidential content.

Supply-Chain Compromise

Model files, adapters, notebooks, datasets, or agent instruction files introduce malicious behavior.

First-Hour Checklist

1. Freeze versions: model, prompt, retrieval config, tool schema, guardrail policy, and gateway routing.
2. Preserve traces: request IDs, prompts, retrieved chunks, tool calls, approvals, logs, screenshots, and source documents.
3. Contain safely: disable risky tools, narrow connectors, revoke exposed tokens, pause provider fallback, and restrict affected tenants.
4. Classify data: identify whether prompts, completions, embeddings, files, logs, or actions exposed regulated or confidential data.
5. Build retest: convert the incident trigger into an eval before releasing prompt, policy, retrieval, or permission changes.

Evidence Sources

Gateway

Request IDs, API key, tenant, model, prompt version, route, latency, tokens, refusal and safety metadata.

Retrieval

Chunk IDs, scores, ACL filters, source documents, connector sync time, deleted or stale index state.

Agent

Plan, memory, tool call, arguments, approval, output, filesystem/network policy, and denied actions.

Product

User role, session, rendered output, downstream automation, ticket/email/PR side effects, and customer reports.

Close the loop

Every AI incident should leave behind a regression eval, a permission or retrieval control improvement, and an owner for model/prompt/version drift monitoring.

Closeout & Response

Operator Playbook

Respond to AI-specific incidents by preserving prompts, retrieval, tool actions, model metadata, and downstream effects before tuning away evidence.

Authorized use only

Offensive Focus

Reconstruct attacker timeline across user input, retrieved content, model decisions, tools, logs, and external side effects.
Preserve canaries, chunk IDs, prompts, completions, tool arguments, model versions, and approval events.
Convert incidents into hardened controls and regression tests.

Evidence To Capture

Written scope and allowed test classes
Timestamped prompts, retrieved context, tool calls, and response artifacts
Request IDs, model/provider/version, policy decisions, and tenant or user role
Screenshots or exported logs that reproduce the finding without exposing client secrets

Offensive Test Cases

AI incident timeline reconstruction

Objective: Rebuild a prompt-injection, retrieval poisoning, or unsafe-agent incident from logs and artifacts.
Authorized setup: Use tabletop data or a staged incident in a lab environment.
Evidence: Timeline, request IDs, chunks, prompts, outputs, tool calls, affected users, and containment actions.

Containment regression test

Objective: Verify that containment and remediation prevent the same abuse path from recurring.
Authorized setup: Use sanitized incident inputs and staging controls.
Evidence: Original failure, fix, regression test, pass result, and monitoring update.

Common Findings

Teams tune prompts or delete documents before preserving forensic context.
Model gateway logs omit prompt versions, retrieval chunks, or tool arguments.
Incident response playbooks do not cover AI-specific artifacts and owners.

Lab Ideas

Run a tabletop for poisoned retrieval in an internal knowledge bot.
Create a first-hour evidence checklist for an unsafe agent action.
Convert a staged AI incident into two regression tests.

Related Offensive AI Guides

DFIR

Incident response process and evidence handling.

RAG Security Testing

Retrieval traces and poisoned source content.

Agent Permission Model

Tool-call traces, approvals, and sandbox logs.

Model API Security

Gateway logs, routing, retention, and abuse controls.