Agent Security

Advanced

T1059 | Command and Scripting Interpreter T1552 | Unsecured Credentials

Agent Permission Model

AI agents turn model output into actions. That makes permissions, approvals, memory, and sandboxes the real security boundary. A safe agent cannot rely on the model simply choosing to behave.

Permission beats prompt safety

Treat model instructions as advisory and tool policy as enforceable. The model may be tricked, but a well-designed tool boundary should still prevent secret reads, destructive writes, and out-of-scope network access.

V2 Attack Flow Diagram

Agent Permission Abuse Path

Test whether untrusted context can become an unauthorized action despite tool policy.

Input01

Untrusted Context

Web page, ticket, repo instruction, retrieved doc, or chat message.

Plan02

Plan Mutation

Goal hijack, memory poisoning, tool selection, or unsafe argument proposal.

Control03

Policy Boundary

Schema validation, allowlist, approval preview, sandbox, and credential scope.

Trace04

Audit Result

Allowed, blocked, warned, escalated, or silently attempted action.

Agent findings are strongest when the attempted tool action and server-side enforcement decision are both visible.

Permission Layers

Identity

Which user, tenant, role, and service account is the agent acting as?

Tools

What tools exist, what arguments do they accept, and what resources can they touch?

Approvals

Which actions require preview, confirmation, dual control, or manual handoff?

Sandbox

What filesystem, network, process, browser, and package access is technically blocked?

Audit

Can humans reconstruct prompts, decisions, tool calls, outputs, and denied actions?

Agent Abuse Cases

Confused deputy

Untrusted content convinces the agent to use its legitimate permissions for an attacker-controlled goal. Test with retrieved docs, web pages, tickets, comments, and repo instructions.

Tool argument injection

The model passes unsafe arguments to shell, browser, HTTP, database, filesystem, or ticketing tools. Validate strict schemas, allowlists, and server-side policy.

Memory poisoning

A temporary prompt or document persists instructions into notes, memory, plans, or future workflow context. Test reset, review, expiration, and provenance controls.

Computer-use deception

Browser/computer-use agents can be misled by fake UI, hidden page instructions, terminal control sequences, and destructive confirmations. Require bounded sessions and human review.

Hardening Checklist

Default deny: expose no tools until the workflow and role require them.
Scoped credentials: use per-tool, per-tenant, least-privilege tokens with rotation.
Typed schemas: reject free-form command strings and validate arguments server side.
Action previews: show the exact target, diff, recipient, or command before high-impact actions.
Sandboxing: block secret paths, uncontrolled network, package installs, and privileged processes.
Traceability: log prompt version, model version, retrieved content, tool call, approval, and result.

Testing standard

A permission finding is strongest when it proves that untrusted content caused the agent to attempt an action and that the technical boundary either blocked it or failed to block it.

Agents & MCP

Operator Playbook

Prove whether an AI agent can exceed intended authority through tool arguments, approval gaps, memory, browser/computer-use deception, or confused-deputy behavior.

Authorized use only

Offensive Focus

Test the permission model at the server/tool boundary, not only in the prompt.
Use mock destructive tools to measure attempted impact without causing harm.
Capture plan, memory, approval, tool arguments, result, and audit trail.

Evidence To Capture

Written scope and allowed test classes
Timestamped prompts, retrieved context, tool calls, and response artifacts
Request IDs, model/provider/version, policy decisions, and tenant or user role
Screenshots or exported logs that reproduce the finding without exposing client secrets

Offensive Test Cases

Approval bypass resistance

Objective: Verify whether an agent attempts a write/delete/send action without explicit approval.
Authorized setup: Use mock tools that return safe results and record attempted operations.
Evidence: Plan, approval prompt, tool arguments, denial/allow decision, and audit log.

Memory poisoning reset test

Objective: Check whether untrusted context persists into future tasks or user sessions.
Authorized setup: Use a lab agent with inspectable memory and harmless marker content.
Evidence: Injected marker, memory record, later task behavior, reset result, and provenance metadata.

Common Findings

Tool enforcement lives in the prompt rather than server-side policy.
Agents can pass broad shell, browser, HTTP, or filesystem arguments.
Memory has no provenance, expiration, or user-visible review path.

Lab Ideas

Build a mock ticketing tool and test send/edit/delete approval gates.
Create a browser-agent deception page with harmless fake UI.
Test role-based tool policies with two lab users.

Related Offensive AI Guides

MCP Security

Model Context Protocol threats and defenses.

Vibe-Coding Security

IDE agent rule injection and auto-exec risk.

AI App Pentest Methodology

End-to-end AI application testing workflow.

Secure Architecture

Design patterns for safer system boundaries.