Methodology

Advanced

AI Application Pentest Methodology

Treat AI features like distributed systems with probabilistic decision points. The test target is not only the model: it is the prompt chain, retrieval layer, permission model, gateway, observability stack, and the business process that trusts the output.

Use the planner first

Start with the planner below to create a scoped test plan, then refine it against the client's architecture and rules of engagement.

AI engagement planner

Build a scoped AI test plan in seconds

Select the system shape and generate a practitioner checklist mapped to OWASP LLM, MITRE ATLAS, evidence, and report deliverables.

Checks

Evidence

Required

Output

Markdown

01 Scoping

Document model/provider, user roles, data classes, tool permissions, allowed tests, and explicit no-go actions before testing begins.

Rules of engagement addendum, architecture sketch, permission inventory.

OWASP LLM01OWASP LLM07

02 Prompt injection

Test direct and indirect instruction conflicts across user input, retrieved content, files, tickets, web pages, and third-party connectors.

Prompt, source content, model response, guardrail behavior, impact notes.

OWASP LLM01MITRE ATLAS AML.T0051

03 Abuse-case execution

Run at least one end-to-end authorized abuse path that starts with adversarial input and reaches an unsafe answer, data exposure, permission decision, or attempted tool action.

Abuse-case objective, fixture data, model context, response, downstream effect, and business-impact statement.

OWASP LLM01OWASP LLM05OWASP LLM06

04 Regression pack

Convert accepted findings into repeatable evals with expected safe behavior, owner, severity, and rerun instructions.

Prompt/eval file, fixture IDs, scoring rule, model version, prompt version, pass/fail output.

OWASP LLM09OWASP LLM10

05 Output handling

Verify that generated output cannot trigger unsafe rendering, hidden links, credential disclosure, or automated downstream actions.

Rendered output capture, sanitizer behavior, blocked content examples.

OWASP LLM02OWASP LLM05

06 Monitoring

Confirm prompts, retrieval hits, tool calls, approvals, and high-risk denials are logged with enough context for incident response.

Log samples, alert rules, retention policy, privacy notes.

OWASP LLM10

07 Retrieval boundaries

Test cross-tenant retrieval, document-level ACLs, stale index entries, poisoned chunks, and citation integrity.

Corpus sample, retrieval trace, denied-document test, citation diff.

OWASP LLM02OWASP LLM06OWASP LLM08

08 RAG abuse paths

Seed harmless poisoned documents, cross-tenant canaries, stale index records, and citation-laundering fixtures to prove retrieval trust-boundary failures.

Document ID, chunk ID, retrieval score, source ACL, answer text, citation comparison, and canary result.

OWASP LLM01OWASP LLM02OWASP LLM08

09 Knowledge ingestion

Review upload, crawl, sync, and connector paths for malicious instructions, hidden text, metadata injection, and unsafe file parsing.

Ingestion path map, sample poisoned document, sanitizer output.

OWASP LLM03MITRE ATLAS AML.T0046

10 Read scope

Confirm read tools cannot access secrets, unrelated tenants, local credential stores, or out-of-scope repositories.

Allowlist, denied read tests, data-boundary notes.

OWASP LLM06

11 Confidentiality

Prefer local or approved enterprise endpoints, redact secrets before prompts, and record exactly where client data is processed.

Data-flow record, redaction examples, approved endpoint list.

OWASP LLM06OWASP LLM10

12 Hybrid routing

Verify sensitive prompts route only to approved local/private endpoints and fallback behavior does not leak data to public providers.

Routing policy, fallback test, denied external call.

OWASP LLM06OWASP LLM07

Assessment Phases

1. Scope and data rules

Document AI features, model/provider, data classes, allowed test types, prohibited actions, and evidence handling.

2. Architecture mapping

Map users, prompts, retrieval, memory, tools, model gateways, output sinks, logging, and approval gates.

3. Abuse case design

Convert OWASP LLM, MITRE ATLAS, business flows, and tool permissions into testable hypotheses.

4. Controlled execution

Run tests in low-risk order: prompt conflicts, retrieval boundaries, tool abuse, tenant isolation, output handling, and monitoring.

5. Evidence and retest

Preserve prompts, traces, screenshots, logs, control behavior, business impact, and fix validation criteria.

Minimum Test Matrix

Area	What to test	Proof to collect
Prompt layer	Direct, indirect, multi-turn, role conflict, and hidden instruction handling.	Prompt, context, response, refusal or bypass trace.
Retrieval	Chunk poisoning, ACL filters, stale index entries, source ranking, and citation integrity.	Retrieved chunks, scores, source IDs, user role.
Tools and agents	Tool allowlists, argument validation, approval gates, memory persistence, sandbox escape paths.	Tool manifest, approval logs, blocked calls, sandbox policy.
Model gateway	Tenant isolation, abuse throttling, provider fallback, prompt/version pinning, sensitive-output controls.	Gateway config, request IDs, rate-limit results, version record.
Operations	Logging, alerting, incident response, privacy retention, data minimization, and human handoff.	Log samples, alert rule, retention setting, escalation workflow.

Client Deliverables

AI RoE addendum

Allowed models, providers, test classes, tool-use boundaries, evidence restrictions, and emergency stop conditions.

AI attack-surface map

Trust boundaries for prompts, retrieval, tools, memory, gateways, human approval, logs, and output consumers.

Regression pack

A small set of reproducible prompts, fixtures, and expected outcomes that validate fixes after model or prompt changes.

Quality bar

A strong AI finding includes the exact user role, source content or fixture, model/prompt version, retrieved context, tool-call trace if applicable, observed business impact, and the control that should have stopped it.

AI App Testing

Operator Playbook

Run an end-to-end authorized AI application pentest that covers prompts, retrieval, tools, gateways, users, logs, and downstream impact.

Authorized use only

Offensive Focus

Convert scope into abuse cases before testing individual prompts.
Prioritize tests that can cross trust boundaries, expose data, invoke tools, alter decisions, or create reportable business impact.
Turn every accepted finding into a regression fixture.

Evidence To Capture

Written scope and allowed test classes
Timestamped prompts, retrieved context, tool calls, and response artifacts
Request IDs, model/provider/version, policy decisions, and tenant or user role
Screenshots or exported logs that reproduce the finding without exposing client secrets

Offensive Test Cases

AI RoE addendum review

Objective: Identify gaps in scope around models, providers, prompt logs, vector stores, and tool actions.
Authorized setup: Review written authorization with the client before active testing.
Evidence: Approved test classes, exclusions, provider rules, emergency contacts, and stop conditions.

Abuse-case execution sprint

Objective: Execute a small prioritized set of AI-specific abuse cases and document impact.
Authorized setup: Use test accounts, fixtures, and approved provider routes.
Evidence: Test case, inputs, model context, outputs, logs, side effects, and remediation owner.

Common Findings

AI scope is bolted onto a web pentest without model, retrieval, or tool-use authorization.
Testing focuses on jailbreak prompts instead of business-impact abuse cases.
No one owns regression testing after model, prompt, or connector changes.

Lab Ideas

Write an AI pentest RoE addendum for a sample SaaS support bot.
Build a risk-ranked abuse-case backlog for RAG, agents, and model APIs.
Turn one finding into a Promptfoo regression test.

Related Offensive AI Guides

RAG Security Testing

Retrieval, poisoning, ACL, and citation testing.

AI Evaluation Workbench

Turn AI findings into regression tests.

Agent Permission Model

Tool permissions, approvals, and sandboxing.

Secure Architecture

Design safer AI-enabled systems.