AI Application Pentest Methodology
Treat AI features like distributed systems with probabilistic decision points. The test target is not only the model: it is the prompt chain, retrieval layer, permission model, gateway, observability stack, and the business process that trusts the output.
Use the planner first
Build a scoped AI test plan in seconds
Select the system shape and generate a practitioner checklist mapped to OWASP LLM, MITRE ATLAS, evidence, and report deliverables.
Document model/provider, user roles, data classes, tool permissions, allowed tests, and explicit no-go actions before testing begins.
Rules of engagement addendum, architecture sketch, permission inventory.
Test direct and indirect instruction conflicts across user input, retrieved content, files, tickets, web pages, and third-party connectors.
Prompt, source content, model response, guardrail behavior, impact notes.
Run at least one end-to-end authorized abuse path that starts with adversarial input and reaches an unsafe answer, data exposure, permission decision, or attempted tool action.
Abuse-case objective, fixture data, model context, response, downstream effect, and business-impact statement.
Convert accepted findings into repeatable evals with expected safe behavior, owner, severity, and rerun instructions.
Prompt/eval file, fixture IDs, scoring rule, model version, prompt version, pass/fail output.
Verify that generated output cannot trigger unsafe rendering, hidden links, credential disclosure, or automated downstream actions.
Rendered output capture, sanitizer behavior, blocked content examples.
Confirm prompts, retrieval hits, tool calls, approvals, and high-risk denials are logged with enough context for incident response.
Log samples, alert rules, retention policy, privacy notes.
Test cross-tenant retrieval, document-level ACLs, stale index entries, poisoned chunks, and citation integrity.
Corpus sample, retrieval trace, denied-document test, citation diff.
Seed harmless poisoned documents, cross-tenant canaries, stale index records, and citation-laundering fixtures to prove retrieval trust-boundary failures.
Document ID, chunk ID, retrieval score, source ACL, answer text, citation comparison, and canary result.
Review upload, crawl, sync, and connector paths for malicious instructions, hidden text, metadata injection, and unsafe file parsing.
Ingestion path map, sample poisoned document, sanitizer output.
Confirm read tools cannot access secrets, unrelated tenants, local credential stores, or out-of-scope repositories.
Allowlist, denied read tests, data-boundary notes.
Prefer local or approved enterprise endpoints, redact secrets before prompts, and record exactly where client data is processed.
Data-flow record, redaction examples, approved endpoint list.
Verify sensitive prompts route only to approved local/private endpoints and fallback behavior does not leak data to public providers.
Routing policy, fallback test, denied external call.
Assessment Phases
1. Scope and data rules
Document AI features, model/provider, data classes, allowed test types, prohibited actions, and evidence handling.
2. Architecture mapping
Map users, prompts, retrieval, memory, tools, model gateways, output sinks, logging, and approval gates.
3. Abuse case design
Convert OWASP LLM, MITRE ATLAS, business flows, and tool permissions into testable hypotheses.
4. Controlled execution
Run tests in low-risk order: prompt conflicts, retrieval boundaries, tool abuse, tenant isolation, output handling, and monitoring.
5. Evidence and retest
Preserve prompts, traces, screenshots, logs, control behavior, business impact, and fix validation criteria.
Minimum Test Matrix
| Area | What to test | Proof to collect |
|---|---|---|
| Prompt layer | Direct, indirect, multi-turn, role conflict, and hidden instruction handling. | Prompt, context, response, refusal or bypass trace. |
| Retrieval | Chunk poisoning, ACL filters, stale index entries, source ranking, and citation integrity. | Retrieved chunks, scores, source IDs, user role. |
| Tools and agents | Tool allowlists, argument validation, approval gates, memory persistence, sandbox escape paths. | Tool manifest, approval logs, blocked calls, sandbox policy. |
| Model gateway | Tenant isolation, abuse throttling, provider fallback, prompt/version pinning, sensitive-output controls. | Gateway config, request IDs, rate-limit results, version record. |
| Operations | Logging, alerting, incident response, privacy retention, data minimization, and human handoff. | Log samples, alert rule, retention setting, escalation workflow. |
Client Deliverables
AI RoE addendum
Allowed models, providers, test classes, tool-use boundaries, evidence restrictions, and emergency stop conditions.
AI attack-surface map
Trust boundaries for prompts, retrieval, tools, memory, gateways, human approval, logs, and output consumers.
Regression pack
A small set of reproducible prompts, fixtures, and expected outcomes that validate fixes after model or prompt changes.
Quality bar
AI App Testing
Operator Playbook
Run an end-to-end authorized AI application pentest that covers prompts, retrieval, tools, gateways, users, logs, and downstream impact.
Offensive Focus
- Convert scope into abuse cases before testing individual prompts.
- Prioritize tests that can cross trust boundaries, expose data, invoke tools, alter decisions, or create reportable business impact.
- Turn every accepted finding into a regression fixture.
Evidence To Capture
- Written scope and allowed test classes
- Timestamped prompts, retrieved context, tool calls, and response artifacts
- Request IDs, model/provider/version, policy decisions, and tenant or user role
- Screenshots or exported logs that reproduce the finding without exposing client secrets
Offensive Test Cases
AI RoE addendum review
- Objective
- Identify gaps in scope around models, providers, prompt logs, vector stores, and tool actions.
- Authorized setup
- Review written authorization with the client before active testing.
- Evidence
- Approved test classes, exclusions, provider rules, emergency contacts, and stop conditions.
Abuse-case execution sprint
- Objective
- Execute a small prioritized set of AI-specific abuse cases and document impact.
- Authorized setup
- Use test accounts, fixtures, and approved provider routes.
- Evidence
- Test case, inputs, model context, outputs, logs, side effects, and remediation owner.
Common Findings
- AI scope is bolted onto a web pentest without model, retrieval, or tool-use authorization.
- Testing focuses on jailbreak prompts instead of business-impact abuse cases.
- No one owns regression testing after model, prompt, or connector changes.
Lab Ideas
- Write an AI pentest RoE addendum for a sample SaaS support bot.
- Build a risk-ranked abuse-case backlog for RAG, agents, and model APIs.
- Turn one finding into a Promptfoo regression test.