RAG Security Testing
Retrieval-augmented generation systems fail differently from ordinary chatbots. The model is only one part of the target: documents, connectors, chunking, embedding search, access control, citations, and output rendering all become testable attack surface.
Authorized systems only
V2 Attack Flow Diagram
RAG Abuse Path
Follow the evidence from source content to retrieved context, answer, and reportable impact.
Seed Fixture
Approved poisoned document, canary record, stale source, or connector item.
Retrieve Context
Chunking, embeddings, ACL filters, ranking, and citation selection.
Influence Answer
Instruction/data confusion, citation laundering, or cross-tenant leakage.
Capture Proof
Document ID, chunk ID, score, prompt context, output, and control result.
Ingestion
Files, crawlers, tickets, wiki pages, email, and SaaS connectors.
Retrieval
Embeddings, chunk ranking, filters, tenant boundaries, and ACL joins.
Generation
Prompt assembly, citations, tool calls, refusals, and fallback behavior.
Evidence
Trace IDs, retrieved chunks, user role, prompt, output, and blocked controls.
Core Test Areas
Document and chunk poisoning
Place benign test documents in approved locations that contain conflicting instructions, hidden text, metadata instructions, ambiguous citations, and malicious support links. Verify whether the system treats retrieved content as data or instruction.
Tenant and ACL boundaries
Create users with different roles and confirm retrieval cannot cross customer, department, project, or document-level access boundaries. Test deleted documents and stale index entries after permission changes.
Citation integrity
Compare cited sources against the actual answer. Flag fabricated citations, unsupported claims, citation laundering, hidden-source reliance, and answers that cite low-trust material over authoritative sources.
Sensitive data leakage
Test whether secrets, HR records, customer data, private tickets, API responses, or embeddings can be inferred through summaries, autocomplete, broad questions, or multi-turn narrowing.
Evidence Checklist
- User context: role, tenant, group, and active permissions.
- Source record: document ID, connector, timestamp, and classification.
- Retrieval trace: chunk IDs, score, filters, and denied candidates where available.
- Prompt assembly: system prompt version, retrieved context, and tool schema version.
- Output proof: answer, citations, rendered view, and any downstream action.
- Control result: blocked, allowed, warned, logged, escalated, or silently failed.
Report-Ready Finding Patterns
Indirect Prompt Injection
A retrieved document can override assistant behavior or redirect users because untrusted content is included without instruction/data separation.
Cross-Tenant Retrieval
A user can retrieve or infer content outside their authorization boundary through broad questions, stale index entries, or missing ACL filters.
Citation Laundering
The answer cites an approved source while relying on untrusted or unrelated retrieved chunks, making unsupported advice look authoritative.
Best defenses to verify
AI App Testing
Operator Playbook
Exploit-test RAG trust boundaries using controlled poisoned documents, cross-tenant fixtures, citation checks, and retrieval evidence.
Offensive Focus
- Seed harmless adversarial documents to test whether retrieved text can override instructions or alter answers.
- Probe ACL, tenant, stale-index, connector, and citation boundaries with canary data.
- Capture chunk IDs, scores, source documents, and generated answers for every finding.
Evidence To Capture
- Written scope and allowed test classes
- Timestamped prompts, retrieved context, tool calls, and response artifacts
- Request IDs, model/provider/version, policy decisions, and tenant or user role
- Screenshots or exported logs that reproduce the finding without exposing client secrets
Offensive Test Cases
Indirect prompt injection via retrieved document
- Objective
- Verify whether a retrieved chunk can change assistant behavior or suppress normal policy.
- Authorized setup
- Place a harmless marker instruction in an approved test document or staging knowledge base.
- Evidence
- Document ID, chunk ID, retrieval score, prompt context, answer, and control decision.
Cross-tenant canary retrieval
- Objective
- Check whether one tenant/user can retrieve or infer another tenant/user canary value.
- Authorized setup
- Use two lab tenants and canary records approved for retrieval-boundary testing.
- Evidence
- User roles, query, retrieved chunks, ACL filters, answer, and logs.
Common Findings
- Retrieved content is treated as instruction rather than untrusted data.
- Vector searches return stale or cross-tenant chunks because ACL filters are missing at query time.
- Answers cite trusted documents while relying on unrelated or poisoned chunks.
Lab Ideas
- Build a two-tenant RAG fixture with canary documents.
- Create a harmless poisoned policy document and validate instruction/data separation.
- Write a citation integrity test that compares answer claims to source chunks.
Related Offensive AI Guides
AI App Pentest Methodology
End-to-end testing flow for AI-enabled applications.
AI Attack & Defense
Prompt injection, jailbreaks, and model-facing controls.
Agent Permission Model
Tool, memory, approval, and sandbox boundaries.
AI Supply Chain
Model, dataset, registry, and artifact supply-chain risks.