AI AppSec
Advanced
AML.T0046 AML.T0051

RAG Security Testing

Retrieval-augmented generation systems fail differently from ordinary chatbots. The model is only one part of the target: documents, connectors, chunking, embedding search, access control, citations, and output rendering all become testable attack surface.

Authorized systems only

Test only systems, repositories, document stores, and connectors explicitly listed in the rules of engagement. RAG testing often touches sensitive internal knowledge, so evidence should minimize copied source content and preserve only the fields needed to prove impact.

V2 Attack Flow Diagram

RAG Abuse Path

Follow the evidence from source content to retrieved context, answer, and reportable impact.

Source01

Seed Fixture

Approved poisoned document, canary record, stale source, or connector item.

Search02

Retrieve Context

Chunking, embeddings, ACL filters, ranking, and citation selection.

Model03

Influence Answer

Instruction/data confusion, citation laundering, or cross-tenant leakage.

Evidence04

Capture Proof

Document ID, chunk ID, score, prompt context, output, and control result.

Strong findings prove which retrieval boundary failed and include enough trace data to reproduce the answer after a fix.

Ingestion

Files, crawlers, tickets, wiki pages, email, and SaaS connectors.

Retrieval

Embeddings, chunk ranking, filters, tenant boundaries, and ACL joins.

Generation

Prompt assembly, citations, tool calls, refusals, and fallback behavior.

Evidence

Trace IDs, retrieved chunks, user role, prompt, output, and blocked controls.

Core Test Areas

Document and chunk poisoning

Place benign test documents in approved locations that contain conflicting instructions, hidden text, metadata instructions, ambiguous citations, and malicious support links. Verify whether the system treats retrieved content as data or instruction.

Tenant and ACL boundaries

Create users with different roles and confirm retrieval cannot cross customer, department, project, or document-level access boundaries. Test deleted documents and stale index entries after permission changes.

Citation integrity

Compare cited sources against the actual answer. Flag fabricated citations, unsupported claims, citation laundering, hidden-source reliance, and answers that cite low-trust material over authoritative sources.

Sensitive data leakage

Test whether secrets, HR records, customer data, private tickets, API responses, or embeddings can be inferred through summaries, autocomplete, broad questions, or multi-turn narrowing.

Evidence Checklist

  • User context: role, tenant, group, and active permissions.
  • Source record: document ID, connector, timestamp, and classification.
  • Retrieval trace: chunk IDs, score, filters, and denied candidates where available.
  • Prompt assembly: system prompt version, retrieved context, and tool schema version.
  • Output proof: answer, citations, rendered view, and any downstream action.
  • Control result: blocked, allowed, warned, logged, escalated, or silently failed.

Report-Ready Finding Patterns

Indirect Prompt Injection

A retrieved document can override assistant behavior or redirect users because untrusted content is included without instruction/data separation.

Cross-Tenant Retrieval

A user can retrieve or infer content outside their authorization boundary through broad questions, stale index entries, or missing ACL filters.

Citation Laundering

The answer cites an approved source while relying on untrusted or unrelated retrieved chunks, making unsupported advice look authoritative.

Best defenses to verify

Strong RAG systems separate instructions from retrieved data, apply ACL filters before retrieval, store retrieval traces, sanitize ingested content, show citations with confidence, and regression-test fixes whenever prompts, chunking, models, or connectors change.

AI App Testing

Operator Playbook

Exploit-test RAG trust boundaries using controlled poisoned documents, cross-tenant fixtures, citation checks, and retrieval evidence.

Authorized use only

Offensive Focus

  • Seed harmless adversarial documents to test whether retrieved text can override instructions or alter answers.
  • Probe ACL, tenant, stale-index, connector, and citation boundaries with canary data.
  • Capture chunk IDs, scores, source documents, and generated answers for every finding.

Evidence To Capture

  • Written scope and allowed test classes
  • Timestamped prompts, retrieved context, tool calls, and response artifacts
  • Request IDs, model/provider/version, policy decisions, and tenant or user role
  • Screenshots or exported logs that reproduce the finding without exposing client secrets

Offensive Test Cases

Indirect prompt injection via retrieved document

Objective
Verify whether a retrieved chunk can change assistant behavior or suppress normal policy.
Authorized setup
Place a harmless marker instruction in an approved test document or staging knowledge base.
Evidence
Document ID, chunk ID, retrieval score, prompt context, answer, and control decision.

Cross-tenant canary retrieval

Objective
Check whether one tenant/user can retrieve or infer another tenant/user canary value.
Authorized setup
Use two lab tenants and canary records approved for retrieval-boundary testing.
Evidence
User roles, query, retrieved chunks, ACL filters, answer, and logs.

Common Findings

  • Retrieved content is treated as instruction rather than untrusted data.
  • Vector searches return stale or cross-tenant chunks because ACL filters are missing at query time.
  • Answers cite trusted documents while relying on unrelated or poisoned chunks.

Lab Ideas

  • Build a two-tenant RAG fixture with canary documents.
  • Create a harmless poisoned policy document and validate instruction/data separation.
  • Write a citation integrity test that compares answer claims to source chunks.