Agent Security
Advanced
T1059 T1552

Agent Permission Model

AI agents turn model output into actions. That makes permissions, approvals, memory, and sandboxes the real security boundary. A safe agent cannot rely on the model simply choosing to behave.

Permission beats prompt safety

Treat model instructions as advisory and tool policy as enforceable. The model may be tricked, but a well-designed tool boundary should still prevent secret reads, destructive writes, and out-of-scope network access.

V2 Attack Flow Diagram

Agent Permission Abuse Path

Test whether untrusted context can become an unauthorized action despite tool policy.

Input01

Untrusted Context

Web page, ticket, repo instruction, retrieved doc, or chat message.

Plan02

Plan Mutation

Goal hijack, memory poisoning, tool selection, or unsafe argument proposal.

Control03

Policy Boundary

Schema validation, allowlist, approval preview, sandbox, and credential scope.

Trace04

Audit Result

Allowed, blocked, warned, escalated, or silently attempted action.

Agent findings are strongest when the attempted tool action and server-side enforcement decision are both visible.

Permission Layers

Identity

Which user, tenant, role, and service account is the agent acting as?

Tools

What tools exist, what arguments do they accept, and what resources can they touch?

Approvals

Which actions require preview, confirmation, dual control, or manual handoff?

Sandbox

What filesystem, network, process, browser, and package access is technically blocked?

Audit

Can humans reconstruct prompts, decisions, tool calls, outputs, and denied actions?

Agent Abuse Cases

Confused deputy

Untrusted content convinces the agent to use its legitimate permissions for an attacker-controlled goal. Test with retrieved docs, web pages, tickets, comments, and repo instructions.

Tool argument injection

The model passes unsafe arguments to shell, browser, HTTP, database, filesystem, or ticketing tools. Validate strict schemas, allowlists, and server-side policy.

Memory poisoning

A temporary prompt or document persists instructions into notes, memory, plans, or future workflow context. Test reset, review, expiration, and provenance controls.

Computer-use deception

Browser/computer-use agents can be misled by fake UI, hidden page instructions, terminal control sequences, and destructive confirmations. Require bounded sessions and human review.

Hardening Checklist

  • Default deny: expose no tools until the workflow and role require them.
  • Scoped credentials: use per-tool, per-tenant, least-privilege tokens with rotation.
  • Typed schemas: reject free-form command strings and validate arguments server side.
  • Action previews: show the exact target, diff, recipient, or command before high-impact actions.
  • Sandboxing: block secret paths, uncontrolled network, package installs, and privileged processes.
  • Traceability: log prompt version, model version, retrieved content, tool call, approval, and result.

Testing standard

A permission finding is strongest when it proves that untrusted content caused the agent to attempt an action and that the technical boundary either blocked it or failed to block it.

Agents & MCP

Operator Playbook

Prove whether an AI agent can exceed intended authority through tool arguments, approval gaps, memory, browser/computer-use deception, or confused-deputy behavior.

Authorized use only

Offensive Focus

  • Test the permission model at the server/tool boundary, not only in the prompt.
  • Use mock destructive tools to measure attempted impact without causing harm.
  • Capture plan, memory, approval, tool arguments, result, and audit trail.

Evidence To Capture

  • Written scope and allowed test classes
  • Timestamped prompts, retrieved context, tool calls, and response artifacts
  • Request IDs, model/provider/version, policy decisions, and tenant or user role
  • Screenshots or exported logs that reproduce the finding without exposing client secrets

Offensive Test Cases

Approval bypass resistance

Objective
Verify whether an agent attempts a write/delete/send action without explicit approval.
Authorized setup
Use mock tools that return safe results and record attempted operations.
Evidence
Plan, approval prompt, tool arguments, denial/allow decision, and audit log.

Memory poisoning reset test

Objective
Check whether untrusted context persists into future tasks or user sessions.
Authorized setup
Use a lab agent with inspectable memory and harmless marker content.
Evidence
Injected marker, memory record, later task behavior, reset result, and provenance metadata.

Common Findings

  • Tool enforcement lives in the prompt rather than server-side policy.
  • Agents can pass broad shell, browser, HTTP, or filesystem arguments.
  • Memory has no provenance, expiration, or user-visible review path.

Lab Ideas

  • Build a mock ticketing tool and test send/edit/delete approval gates.
  • Create a browser-agent deception page with harmless fake UI.
  • Test role-based tool policies with two lab users.