T14: Human Attacks on Multi-Agent Systems

Definition

Human Attacks on Multi-Agent Systems occur when adversaries exploit inter-agent delegation, trust relationships, and task dependencies to bypass security controls, escalate privileges, or disrupt workflows. The attacker treats the agent topology as the attack surface: by injecting deceptive tasks, rerouting priorities, or overwhelming agents with excessive assignments, they manipulate AI-driven decision-making in ways that are difficult to trace and mitigate because no single agent holds the full picture.

What it looks like in practice

OWASP v1.1 names four scenarios:

Coordinated Privilege Escalation via Multi-Agent Impersonation. A security monitoring platform uses three agents: an identity-verification agent, an access-control agent, and an audit-logging agent. An attacker compromises the identity-verification agent by injecting a modified system prompt that instructs it to return a successful authentication result for any request carrying a specific attacker-controlled session token. The access-control agent, which trusts the upstream verification result, grants administrative access to the attacker’s session. Because the audit-logging agent records only the access-control decision (granted), not the verification agent’s internal state, the forged authentication is invisible in the audit trail.

Agent Delegation Loop for Privilege Escalation. An HR multi-agent system lets a request-triage agent escalate a task to a policy-interpretation agent when the request is ambiguous, and lets the policy-interpretation agent send a request back to the triage agent if additional context is needed. The delegation loop has no depth counter. An attacker crafts a request that is deliberately ambiguous in a way that causes each agent to escalate to the other, adding a small additional permission claim with each bounce: “the upstream agent already confirmed the employee’s clearance level”. After four bounces, the policy-interpretation agent grants a privilege it would never grant on an initial request, because each individual step appeared to carry prior validation.

Denial-of-Service via Agent Task Saturation. A threat-detection platform assigns incoming security alerts to specialist analysis agents. An attacker generates a flood of syntactically valid but low-severity alert events at 10,000 per minute, each structured to match the schema of high-priority alerts but with benign content. The specialist agents, unable to distinguish priority before analysis, spend their entire capacity processing the flood. Genuine high-severity alerts enter the queue but are never processed during the attack window. The attacker uses the coverage gap to conduct a secondary intrusion that the paralysed detection agents never flag.

Cross-Agent Approval Forgery. A financial onboarding system uses separate agents for document verification, liveness detection, and sanctions screening. Each agent returns a structured verdict; a final approval agent combines the three verdicts. An attacker submits a stolen identity document that passes document verification and sanctions screening but fails liveness detection. The attacker also intercepts the inter-agent message from the liveness agent and replaces “FAIL” with “PASS” before the approval agent receives it. Each individual agent performed its check correctly; only the message in transit was altered. The approval agent, seeing three “PASS” verdicts, approves the account.

Why it’s dangerous

Multi-agent systems redistribute trust. An action that would require multiple approvals from a single agent can be decomposed into individually plausible requests across several agents, each of which only sees a slice of the workflow. Delegation loops, fragmented audit trails, and uneven authentication policies between agents turn the system itself into the attacker’s tool.

Where it manifests

Inspect inter-agent delegation policies and whether task assignments between agents are authenticated end-to-end. Check whether each agent reasons about its own authority or simply inherits it implicitly from upstream agents. Ask whether saturation of one agent can cause cascading failures elsewhere, and whether cross-agent approval chains can compose into privileges that no individual agent would grant on its own.

Detection signals

Multi-agent privilege abuse leaves traces in delegation depth logs, inter-agent message flows, and queue metrics.

Delegation chain depth exceeding a defined ceiling: instrument the orchestrator to log the hop count for every task delegation; alert when any single task accumulates more than N inter-agent hops (e.g., N = 3) without completing. Escalation loops are the primary mechanism for delegation-loop privilege escalation.
Permission scope expansion across successive delegation hops: compare the permission claims asserted by each agent in the chain against the claims asserted by the originating request; any incremental permission added mid-chain without a matching authorisation event in the policy engine is a signal.
Alert queue depth growing faster than clearance rate: track the ratio of alerts entering a specialist agent’s queue to alerts cleared per unit time; a sustained ratio above 4:1 over five minutes indicates saturation pressure consistent with a flooding attack.
Inter-agent verdict field modified after origination: hash the structured verdict payload at the sending agent and re-verify the hash at the receiving agent; a mismatch on a verdict field (e.g., liveness, sanctions result) with no corresponding originator re-transmission is evidence of in-transit forgery.
Identical session token used for identity verification across different request contexts: if the identity-verification agent receives the same session token attached to requests for different claimed identities or access levels within a short window, alert. Legitimate tokens are scoped to a single session context.

OWASP Top 10 for Agentic Applications 2026

The Agentic Top 10 (ASI01 through ASI10) is a separate practitioner-facing publication that maps onto the master Threats & Mitigations threat numbering. T14 is covered by the following Top 10 entries:

ASI10 Rogue Agents primary

A rogue agent is one whose behavioural objective has drifted from its authorised purpose, yet its identity still checks out, its actions remain inside its permissions, and its logs look clean. Divergence may originate from prompt injection, supply-chain tampering, or goal hijack; ASI10 names what happens after divergence begins: sustained, covert operation toward an attacker's goal with no single action that trips an alarm.

OWASP LLM Top 10: LLM02:2025 LLM09:2025

Source: OWASP Top 10 for Agentic Applications 2026 (Dec 2025) · the Top 10 is a compass into the master Threats & Mitigations taxonomy, not a replacement for it.

Design principles at stake

When T14 is present, these security design principles are the ones being violated or tested. Each links to the full principle; the mitigations below are how you restore them.

Defence-in-Depth The attack surface here is the topology itself: delegation loops, fragmented audit trails, and inconsistent authentication policies between agents mean that no single agent holds enough information to recognise an escalation in progress. Depth means placing independent controls at each trust boundary: end-to-end authenticated task assignments so that impersonation fails at the protocol layer, rate-limiting on incoming task queues so saturation attacks cannot crowd out legitimate work, and cross-agent approval checks that re-verify rather than inherit authority from upstream agents. Each layer fails independently, so composing individually plausible requests across several agents cannot by itself produce privileges that any one agent would refuse.

Recommended mitigations

Auto-generated from the mitigation catalog: every mitigation whose coverage map includes T14, sorted by maturity tier (Tier 1 production-canonical first, then Tier 2, then Tier 3 research-stage).

Tier 2 Message signing (Inter-agent message signing — end-to-end integrity for A2A and MCP)

An inter-agent message travels through channels and intermediate agents the receiver did not originate. If nothing binds the message cryptographically to its source, any intermediate hop can substitute or inject content that the receiving agent will treat as authoritative. Message signing closes that gap: the source agent signs each message payload with its private key, and the receiver verifies the signature against a distributed trust bundle before the content reaches the reasoning layer.

why it helps Coordinated privilege escalation through multi-agent impersonation requires fabricating messages that appear to come from a trusted principal. When every inter-agent message carries a signature bound to the originating agent's key, fabrication requires key compromise, which is detectable through trust-bundle and certificate-rotation monitoring.
Tier 2 Peer consensus (Multi-agent consensus — N-of-M independent agreement before high-impact actions)

A single agent's judgment on a high-impact action can be wrong, manipulated, or compromised. Requiring N of M independent peer agents to agree before the action executes means an attacker or a systematic error must affect the quorum majority, not just one agent, before harm results.

why it helps Cross-Agent Privilege Escalation relies on one agent convincing another to exercise authority beyond what either was individually granted, compounding permissions across the call chain. Independent peer agreement breaks that chain: forged or inflated authority claims must be accepted by the quorum majority, not just one credulous peer, before the action proceeds.
Tier 2 Trust score (Per-agent trust scoring — behavioural reputation for inter-agent message acceptance)

In a multi-agent system, each agent routes decisions based on what its peers report. If a peer's behaviour becomes unreliable or adversarial, agents that keep treating it with full authority will propagate whatever errors or manipulations that peer introduces. Per-agent trust scoring addresses this by maintaining a continuously updated reputation score for every peer, derived from observed behaviour, and using that score to determine how much authority each incoming message carries.

why it helps Cross-agent approval forgery and identity impersonation generate output that is inconsistent with what the legitimate peer would produce, creating a measurable cross-peer consistency gap. The impersonating agent's score declines on that gap, limiting its delegation authority until attestation is re-verified.

Multi-agent variants: OWASP MAS Guide

The OWASP OWASP MAS Threat Modelling Guide v1.0 catalogues 5 named multi-agent variants of T14, anchored to specific MAESTRO layers. Each is a concrete attack pattern that emerges when this threat compounds across agents.

L4 Distributed Denial of Service extends T4, T14

Coordinated DDoS targeting groups of agents, triggering cascading failure.
L4 Compromised Orchestration for Multi-Agents extends T14

Attacking the orchestration layer to gain access across many agents at once.
L6 Indirect Privilege Escalation extends T3, T14

Agent-specific permissions exploited to execute high-privilege actions on a malicious user's behalf.
CL Privilege Compromise (cross-agent) extends T3, T14

Compromised admin agent exploits legitimate access to create backdoors / modify security settings.
CL Excessive Agency / Permission Bypass extends T3, T14

Chained authorization across MAS lets a malicious user execute beyond their granted permissions.

Source: OWASP MAS Threat Modelling Guide v1.0, §2 Overview of MAESTRO Framework — Extended Threat Scenarios + Cross-Layer table.

Red-team pivot: MITRE ATLAS techniques

MITRE ATLAS catalogues adversary techniques against AI systems. Where this OWASP threat has an attacker-perspective counterpart, the ATLAS technique is shown below. That is what a red team would actually be doing on the wire. Use this for detection-signal anchoring, threat-hunting hypotheses, and IR runbooks. Source: mitre-atlas/atlas-data v5.6.0.

AML.T0073 Impersonation view on ATLAS ↗

Adversary poses as a trusted entity (user, service, peer agent) to gain access or influence decisions.

AML.T0053 AI Agent Tool Invocation view on ATLAS ↗

Adversary causes an agent to invoke a legitimate tool with attacker-controlled parameters, turning a sanctioned capability into an attack vector.

Agentic angle: Maps directly to OWASP T2 Tool Misuse: the agent's tools are operating within their declared scope, but the chosen invocation is unsafe.

AML.T0086 Exfiltration via AI Agent Tool Invocation view on ATLAS ↗

Adversary exfiltrates data by chaining the agent's legitimate tools (e.g. read-only DB query plus an outbound email tool), neither of which is alarming on its own.

Agentic angle: Each step looks routine in audit logs; the *combination* is the attack.

Sources

OWASP-Agentic-AI ↗ · 1.1 (Dec 2025) · Agentic Threats Taxonomy Navigator §Step 6 — Multi-Agent System Threats
MAESTRO ↗ · 1.0 (Apr 2025) · Layer 4 Deployment Infrastructure; Layer 7 Agent Ecosystem