Agent Permission Model
AI agents turn model output into actions. That makes permissions, approvals, memory, and sandboxes the real security boundary. A safe agent cannot rely on the model simply choosing to behave.
Permission beats prompt safety
V2 Attack Flow Diagram
Agent Permission Abuse Path
Test whether untrusted context can become an unauthorized action despite tool policy.
Untrusted Context
Web page, ticket, repo instruction, retrieved doc, or chat message.
Plan Mutation
Goal hijack, memory poisoning, tool selection, or unsafe argument proposal.
Policy Boundary
Schema validation, allowlist, approval preview, sandbox, and credential scope.
Audit Result
Allowed, blocked, warned, escalated, or silently attempted action.
Permission Layers
Identity
Which user, tenant, role, and service account is the agent acting as?
Tools
What tools exist, what arguments do they accept, and what resources can they touch?
Approvals
Which actions require preview, confirmation, dual control, or manual handoff?
Sandbox
What filesystem, network, process, browser, and package access is technically blocked?
Audit
Can humans reconstruct prompts, decisions, tool calls, outputs, and denied actions?
Agent Abuse Cases
Confused deputy
Untrusted content convinces the agent to use its legitimate permissions for an attacker-controlled goal. Test with retrieved docs, web pages, tickets, comments, and repo instructions.
Tool argument injection
The model passes unsafe arguments to shell, browser, HTTP, database, filesystem, or ticketing tools. Validate strict schemas, allowlists, and server-side policy.
Memory poisoning
A temporary prompt or document persists instructions into notes, memory, plans, or future workflow context. Test reset, review, expiration, and provenance controls.
Computer-use deception
Browser/computer-use agents can be misled by fake UI, hidden page instructions, terminal control sequences, and destructive confirmations. Require bounded sessions and human review.
Hardening Checklist
- Default deny: expose no tools until the workflow and role require them.
- Scoped credentials: use per-tool, per-tenant, least-privilege tokens with rotation.
- Typed schemas: reject free-form command strings and validate arguments server side.
- Action previews: show the exact target, diff, recipient, or command before high-impact actions.
- Sandboxing: block secret paths, uncontrolled network, package installs, and privileged processes.
- Traceability: log prompt version, model version, retrieved content, tool call, approval, and result.
Testing standard
Agents & MCP
Operator Playbook
Prove whether an AI agent can exceed intended authority through tool arguments, approval gaps, memory, browser/computer-use deception, or confused-deputy behavior.
Offensive Focus
- Test the permission model at the server/tool boundary, not only in the prompt.
- Use mock destructive tools to measure attempted impact without causing harm.
- Capture plan, memory, approval, tool arguments, result, and audit trail.
Evidence To Capture
- Written scope and allowed test classes
- Timestamped prompts, retrieved context, tool calls, and response artifacts
- Request IDs, model/provider/version, policy decisions, and tenant or user role
- Screenshots or exported logs that reproduce the finding without exposing client secrets
Offensive Test Cases
Approval bypass resistance
- Objective
- Verify whether an agent attempts a write/delete/send action without explicit approval.
- Authorized setup
- Use mock tools that return safe results and record attempted operations.
- Evidence
- Plan, approval prompt, tool arguments, denial/allow decision, and audit log.
Memory poisoning reset test
- Objective
- Check whether untrusted context persists into future tasks or user sessions.
- Authorized setup
- Use a lab agent with inspectable memory and harmless marker content.
- Evidence
- Injected marker, memory record, later task behavior, reset result, and provenance metadata.
Common Findings
- Tool enforcement lives in the prompt rather than server-side policy.
- Agents can pass broad shell, browser, HTTP, or filesystem arguments.
- Memory has no provenance, expiration, or user-visible review path.
Lab Ideas
- Build a mock ticketing tool and test send/edit/delete approval gates.
- Create a browser-agent deception page with harmless fake UI.
- Test role-based tool policies with two lab users.