Framework · MAESTRO

MAESTRO: seven layers

The MAESTRO multi-agent threat-modelling guide (OWASP GenAI Security Project, v1.0 April 2025) decomposes an agentic system into seven architectural layers, from L1 Foundation Models through L7 Agent Ecosystem, plus a Cross-Layer category for emergent multi-agent behaviour. Each layer maps onto specific OWASP threat numbers. Use the stack diagram below to orient yourself, then scroll to each layer for the full prose and threat list, or follow the "full layer reference" link to the per-layer detail page.

Six layers stack vertically; L6 Security & Compliance is drawn as the vertical band on the right because it cuts across every other layer (the MAS-Guide convention). Each cell is clickable and routes to its layer detail page.

L1

Foundation Models

5 threats touch this layer
full layer reference →

The Foundation Model layer is the substrate on which every other layer depends. It encompasses the pretrained weights, alignment tuning, and any runtime inference process that turns an input token sequence into a probability distribution over outputs. In a multi-agent system (MAS), every agent that calls a language model touches this layer, whether the model is self-hosted, accessed through an API, or shared across multiple agents as a common inference endpoint.

What lives here

  • Pretrained model weights and their provenance (origin, training corpus, signing status)
  • Fine-tuned or instruction-tuned variants derived from a base model
  • Model artifacts stored in object storage or a model registry (MLflow, Hugging Face Hub, SageMaker Model Registry)
  • RLHF and RLAIF alignment policies baked into the model
  • Embedding models used for RAG retrieval (separate weights, same layer)
  • Shared inference endpoints where multiple agents call a common model host
  • Model cards and associated metadata that declare training data lineage
  • Quantised or distilled model variants (GGUF, GPTQ, AWQ) that differ in behaviour from the canonical weight

In a multi-agent deployment, a single foundation model may serve dozens of agents. A compromise at this layer has a blast radius proportional to the number of consumers. This is why the Cloud Security Alliance’s MAESTRO guide (Ken Huang, 2025) places model integrity at the base of the stack before any framework or orchestration concern is addressed.

Concrete example: A customer-support platform runs ten LangChain agents that all call a shared self-hosted Llama 3 endpoint. The operator periodically fine-tunes that model on accumulated conversation logs stored in a shared vector store. If an attacker seeds the vector store with adversarially crafted support tickets before a scheduled fine-tune run, the resulting weight update embeds the attacker’s behaviour into every one of the ten agents simultaneously. This is an L1 compromise with L2 as the entry path.

Threats that target this layer

  • T1 Memory Poisoning: when long-term memory is periodically used to fine-tune or update the model, poisoned memory entries corrupt the weight update, affecting every subsequent inference. The model cannot distinguish benign from adversarial training signal.
  • T7 Misaligned and Deceptive Behaviors: alignment policies baked into the model weights can be bypassed, eroded, or never fully instilled. An agent whose foundation model has misaligned objectives will exhibit those misalignments regardless of what the framework layer adds on top.
  • T17 Supply Chain Compromise: the model artifact itself is part of the AI supply chain. A tampered weight file, a poisoned Hugging Face checkpoint, or an unofficial quantised variant introduces adversary-controlled behaviour at the deepest possible layer. Because the model is downstream of training and upstream of everything else, supply-chain compromise at L1 is nearly impossible to detect by monitoring alone.
  • T5 Cascading Hallucination Attacks: systematic inaccuracies or planted associations in training data surface as confident hallucinations that propagate through every agent using the model.

Mitigations anchored here

  • model registry: mandatory version pinning, artifact signing, canary rollout, and rollback paths for every model artifact. The registry gates which weights reach production and records a verifiable chain of custody. Applies to embedding models as well as generative models.
  • signed AIBOM / agent SBOM: a Software Bill of Materials extended to AI artifacts captures training data provenance, base model identity, and fine-tuning lineage. Without this record, T17 supply-chain attacks are invisible until after harm occurs.
  • behavioural red-teaming: structured adversarial evaluation against the model weights before deployment. Identifies alignment gaps (T7) and planted behaviours (T17) that static analysis of weights cannot surface.

How L1 relates to its neighbours

L1 has no layer below it in the MAESTRO stack: it is the base. Its immediate neighbour above is L2 Data Operations, which manages the runtime data surfaces (vector stores, prompt corpora, retrieval pipelines) that the model reads during inference. A threat that corrupts training data (L1) is distinct from one that corrupts retrieval data (L2), even though both ultimately affect model output. L1 governs what the model knows; L2 governs what the model reads at runtime.

L1 also has a direct relationship with L6 Security and Compliance (the vertical band): the governance question of which model versions are approved for production, which training data sources are permissible, and which audit records document the model lifecycle all belong to L6 policy applied to L1 artifacts.


The Foundation Model layer is MAESTRO’s recognition that AI system security begins before the first line of application code. Every architectural control above L1 is contingent on the model weights being what the operator believes them to be. Artifact integrity, supply-chain verification, and alignment evaluation are non-optional baseline controls.

Threats at this layer: T1T7T17T26T48
L2

Data Operations

7 threats touch this layer
full layer reference →

The Data Operations layer covers every data surface that the agent reads, writes, or shares at runtime. This excludes training data (that is L1); it covers the live data plane that shapes each inference. In agentic systems this includes vector stores, prompt templates, retrieval pipelines, shared memory, tool output caches, and any corpus that the agent can query or update during a task. Because agents often share data stores across tasks and across peer agents, a compromise in this layer can propagate laterally in ways that are structurally impossible in single-request LLM applications.

What lives here

  • Vector store indexes used for retrieval-augmented generation (Pinecone, Weaviate, pgvector, Chroma)
  • Retrieval pipelines: chunking logic, embedding generation, similarity scoring, re-ranking
  • Shared short-term and long-term memory: conversation history, task state, cross-session stores
  • Prompt templates, system-prompt files, and instruction corpora managed outside the model
  • Tool output caches and intermediate result stores read by downstream agents
  • Structured knowledge bases queried at inference time (SQL, graph databases, document stores)
  • Data ingestion and validation logic that controls what enters any of the above
  • Data classification tags and access control metadata attached to stored records

In a multi-agent deployment, these surfaces are often shared. A vector store that one agent writes to is frequently the retrieval source for another. The MAESTRO guide (Cloud Security Alliance, Ken Huang, 2025) calls out this shared-write / shared-read topology as the primary reason data-plane threats have elevated severity in MAS compared to single-agent settings.

Concrete example: A legal-research system uses a LlamaIndex pipeline where an ingestion agent chunks and embeds client documents into a shared Weaviate instance, and a separate summarisation agent retrieves from the same store. A malicious PDF submitted by an external party can inject adversarial text at chunk boundaries that is semantically close to legitimate case law. Once embedded, it surfaces in every subsequent query that touches the same namespace, affecting the summaries the second agent produces for all users.

Threats that target this layer

  • T1 Memory Poisoning: adversarial content written into a shared memory or vector store (via direct injection such as a malicious document in the RAG corpus, or via an agent that retrieves and re-stores tampered content) corrupts the context that subsequent agents or tasks read. Unlike a single-session attack, a poisoned vector store persists across restarts.
  • T12 Agent Communication Poisoning: when inter-agent messages are buffered, logged, or cached in a shared store before delivery, an attacker who can write to that store controls the messages. This blurs the line between data-layer and communication-layer attacks.
  • T5 Cascading Hallucination Attacks: retrieval pipelines that return low-quality, outdated, or adversarially seeded chunks amplify hallucination rates. A compromised retrieval corpus turns the model’s tendency to confabulate from a probabilistic nuisance into a reliable attack vector.
  • T17 Supply Chain Compromise: third-party data connectors, embedding pipelines, and corpus update jobs are software with their own dependencies. A compromised data pipeline silently alters what enters the vector store without touching the model or the application code.

Mitigations anchored here

  • memory content validation: validate retrieved content against an expected schema and provenance record before injecting it into the agent’s context window. Rejects chunks whose embedding source, update timestamp, or access label does not match declared policy.
  • memory anomaly detection: monitor vector store reads and writes for statistical deviation from a baseline. Sudden retrieval of previously-unseen clusters, or write patterns inconsistent with normal ingestion, surface poisoning attempts before downstream inference occurs.
  • memory-poisoning defence: hardened ingestion pipeline: content hashing on write, ACL enforcement on read, anomaly detection on retrieval distribution, and rate limiting on bulk write paths. Combines preventive and detective controls for the full store lifecycle.
  • permission-aware vector retrieval: per-namespace and per-document access control on the vector store, enforced at query time. Prevents one tenant’s data from entering another tenant’s context, and prevents agents from reading records outside their declared scope.
  • output provenance tracking: attach retrieval provenance metadata to every chunk returned, so downstream agents and audit logs can trace which store, which document, and which version of a document contributed to a given response.
  • data classification: classify records at ingestion time by sensitivity level and attach immutable labels. Classification gates which agents may read a record and what retention and DLP rules apply.

How L2 relates to its neighbours

L2 sits directly above L1 Foundation Models. The distinction is temporal: L1 concerns data baked into weights at training time; L2 concerns data read at inference time. A threat that corrupts training data targets L1; a threat that corrupts a retrieval corpus targets L2. Both ultimately affect model output, but the mitigations differ: you cannot patch a poisoned embedding store by retraining the model.

The immediate layer above L2 is L3 Agent Frameworks, which consumes the data that L2 produces. The agent framework decides what retrieval queries to issue, how to incorporate retrieved chunks into the prompt, and what to write back to shared memory. Data quality at L2 is a precondition for safe reasoning at L3. A retrieval pipeline that returns attacker-controlled content defeats whatever prompt-injection defences the framework layer applies.


L2 is the layer where the boundary between “the model” and “the data” is most easily collapsed. Treating retrieval, memory, and data ingestion as trusted by default is the most common architectural error in agentic deployments, and the one MAESTRO’s Data Operations layer exists to surface.

Threats at this layer: T1T12T17T18T27T28T49
L3

Agent Frameworks

17 threats touch this layer
full layer reference →

The Agent Frameworks layer is where the agent’s reasoning, planning, and tool execution happen. It encompasses the orchestration logic that decides which tools to call, in what sequence, with what arguments, and how to incorporate results back into the agent’s context. Commercially deployed frameworks (LangChain, AutoGen, CrewAI, LlamaIndex Agents, Semantic Kernel) all live at this layer, as does any custom orchestration code the team writes around them. This is also the layer where planning loops, reflection cycles, and self-correction behaviours run.

What lives here

  • Agent orchestration frameworks (LangChain, AutoGen, CrewAI, Semantic Kernel, LlamaIndex)
  • Custom agent runtimes and orchestration wrappers written around framework primitives
  • Planning and multi-step reasoning loops (ReAct, chain-of-thought, tree-of-thought)
  • Reflection and self-correction logic (critique-revise cycles, plan validation)
  • Tool routing: the mapping from agent intent to tool invocation, argument construction, and result handling
  • Function-calling and structured-output parsing (JSON schema enforcement, argument validation)
  • MCP client-side logic: how the framework issues MCP requests and interprets MCP responses
  • Agent-to-agent delegation: how the framework dispatches subtasks to peer agents
  • Context window management: what gets summarised, truncated, or evicted from the prompt

This layer is the most complex in a typical agentic deployment. It sits above the data plane (L2) and below the infrastructure that hosts it (L4), and it is the primary surface where attacker-controlled input (via prompt injection, poisoned tool output, or manipulated inter-agent messages) is acted upon rather than merely stored.

Concrete example: A CrewAI deployment uses a researcher agent and a writer agent, both backed by a shared tool registry that includes a web-search tool and a file-write tool. An attacker embeds a prompt-injection payload in a public webpage the researcher fetches; the payload instructs the CrewAI orchestrator to add file-write to the writer agent’s next call, exfiltrating internal context to an attacker-controlled URL. The tool-routing logic at L3 is the surface that fails, not the model or the infrastructure.

Threats that target this layer

  • T2 Tool Misuse: the framework’s tool-routing logic can be manipulated by adversarial prompts to call tools with unintended arguments, call the wrong tool, or chain tool calls in sequences the operator never authorised. Because the agent constructs tool arguments from model output, T2 is inherently a L3 threat.
  • T6 Intent Breaking and Goal Manipulation: planning and reflection loops can be manipulated mid-task to redirect the agent toward attacker-controlled goals. An agent that re-evaluates its plan in response to tool output or peer messages is susceptible to goal substitution at each reflection cycle.
  • T5 Cascading Hallucination Attacks: frameworks that pass tool output directly into the next model call without validation create a pipeline where one hallucinated or attacker-fabricated output becomes the authoritative input for the next step.
  • T11 Unexpected RCE and Code Attacks: frameworks that include code-execution tools (Python REPL, shell, SQL query constructors) expose a code-generation and execution surface. Adversarial prompt injection at L3 can produce malicious code that the framework then executes via a code tool.
  • T16 Insecure Inter-Agent Protocol Abuse: the framework’s handling of MCP and A2A protocol messages is an L3 concern. Frameworks that trust tool descriptions at face value, or that do not validate response schemas, are susceptible to protocol-layer injection.

Mitigations anchored here

  • input sanitisation: sanitize all content that enters the agent’s context before it reaches the planning loop. Covers user input, tool output, retrieved chunks, and peer-agent messages, each of which is a potential prompt-injection surface.
  • least-privilege tool scoping: enforce a per-agent tool allowlist at the framework layer. The agent may only call tools in its declared scope; any out-of-scope call is rejected before execution, regardless of what the model requested.
  • plan-vs-goal validation: validate agent plans (explicit ReAct plans, CoT chains, or structured action sequences) against a policy before execution begins. Catch goal-substitution (T6) and out-of-scope tool selection (T2) before the first real-world action runs.
  • context isolation: prevent context from one agent’s session leaking into another’s. In frameworks that share a context store or prompt cache, isolation is a structural prerequisite for multi-tenant safety.
  • MCP response sanitisation: validate MCP server responses against declared schema before injecting them into the agent’s context. Prevents protocol-layer injection (T16) from reaching the planning loop.
  • reflection-loop depth cap: bound the number of reflection or critique-revise cycles the framework will execute per task. Unbounded reflection enables goal-drift amplification (T6) and resource exhaustion (T4) via recursive planning.
  • code-generation review gate: require static analysis or sandboxed execution review before code generated by the agent is run by a code-execution tool. The primary L3 control for T11.

How L3 relates to its neighbours

L3 consumes data from L2 Data Operations (retrieval results, shared memory, prompt templates) and issues execution requests to L4 Deployment Infrastructure (tool calls that run in containers, shell commands, API calls). A failure of trust at L2 (poisoned retrieval) becomes a planning failure at L3. A planning failure at L3 (tool misuse) becomes an infrastructure-level consequence at L4.

L3 is also the layer that most directly implements the controls recommended by L6 Security and Compliance: policy enforcement (OPA, policy-bound autonomy), intent attestation, and least-privilege tool scoping are all L6 policies that take effect inside the L3 orchestration loop.


The Agent Frameworks layer is where autonomy lives, and where it goes wrong. Controls at L1 and L2 reduce the quality of attacker-controlled input; controls at L4 limit blast radius; only L3 controls can intercept a misaligned plan before it executes.

L4

Deployment Infra

9 threats touch this layer
full layer reference →

The Deployment Infrastructure layer covers the runtime environment that hosts the agent: containers, orchestration platforms, secrets management, network egress controls, autoscaling, and the trust boundary between the agent runtime and adjacent systems. This is the layer where MLSecOps practices apply: securing not just the model artifact but the infrastructure that serves it. In a multi-agent system (MAS), L4 is also where the orchestration plane that deploys and manages multiple agent pods or processes lives, making it a high-value target for attackers who want control over the entire system at once.

What lives here

  • Container images that package the agent runtime and its dependencies
  • Container orchestration (Kubernetes deployments, ECS task definitions, Cloud Run services)
  • Secrets management systems accessed by the agent at runtime (Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault)
  • Network egress policies: what external endpoints the agent container may reach
  • Sandbox environments used for tool execution (gVisor, Firecracker, nsjail, Kata Containers)
  • Service mesh and mTLS configuration between agent services and tool backends
  • Autoscaling policies and resource quotas that bound agent compute consumption
  • CI/CD pipelines that build and deploy agent containers (themselves part of the attack surface)
  • The orchestration control plane (Kubernetes API server, ECS scheduler) that can deploy, modify, or terminate agent workloads
  • Non-human identities (NHIs): service accounts, OIDC tokens, and workload certificates the agent uses to authenticate to tools and peers

The MAESTRO guide (Cloud Security Alliance, Ken Huang, 2025) calls out compromised orchestration as a MAS-specific escalation: an attacker who gains control of the orchestration plane can deploy rogue agents, modify running workloads, or drain resources from legitimate agents. These effects go far beyond compromising a single agent instance.

Concrete example: An AutoGen multi-agent coding assistant runs each agent as a Kubernetes pod with a shared service account that was granted secrets:read on the entire namespace for convenience. A code-execution tool (Python REPL) runs without a gVisor sandbox. When a prompt-injection payload triggers execution of attacker-supplied code, the resulting process reads every secret in the namespace (database credentials, API keys, peer-agent tokens) and exfiltrates them via an egress path that no network policy blocked. The L3 injection became an L4 blast-radius failure.

Threats that target this layer

  • T3 Privilege Compromise: service accounts attached to agent containers often accumulate permissions over time. Excessive IAM roles, over-scoped OIDC trust policies, or hardcoded credentials in container images give an attacker who compromises one agent container a foothold into systems far outside the agent’s intended scope.
  • T4 Resource Overload: an agent or a peer that issues unbounded tool calls, spawns recursive sub-agents, or generates arbitrarily large payloads can exhaust container CPU/memory quotas, exhaust API rate limits on shared inference endpoints, or trigger cascading autoscaling that incurs cost without performing useful work.
  • T11 Unexpected RCE and Code Attacks: if the agent executes code generated during inference (Python REPL, shell tool, SQL execution), the container that runs that code is the blast-containment boundary. Insufficient sandboxing at L4 means a code-injection attack (L3) becomes a host or cluster escape at L4.
  • T13 Rogue Agents in Multi-Agent Systems: an attacker who compromises the orchestration control plane can deploy unauthorised agent instances that impersonate legitimate peers or perform malicious actions under the cover of the cluster’s identity.

Mitigations anchored here

  • gVisor sandbox: run tool-execution containers under a gVisor or equivalent kernel-isolation sandbox. Reduces the blast radius of T11 code attacks from host-level to sandbox-level; effective even when the agent framework layer provides no code-review gate.
  • NHI lifecycle management: govern the full lifecycle of non-human identities (service accounts, workload certificates, OIDC tokens) attached to agent processes. Short-lived credentials, automated rotation, and just-in-time issuance limit the window an attacker can exploit compromised credentials.
  • per-agent quota budgets: enforce per-agent resource quotas (API call rate, token budget, concurrent requests) at the infrastructure layer. Quota enforcement at L4 is harder to bypass than at L3 because it does not depend on the agent’s own cooperation.
  • RBAC and ABAC: enforce role-based and attribute-based access control on every service the agent calls. The agent’s service account should hold only the permissions its declared role requires; L4 policy enforcement is the backstop when L3 scope controls fail.
  • time-bounded privilege elevation: issue agent credentials with explicit expiry and scope them to the current task. A time-bounded credential that expires when the task completes cannot be replayed in a later context.
  • SPIFFE / SPIRE workload identity: use SPIFFE/SPIRE to issue short-lived, workload-attested identities to agent processes. Workload attestation binds the credential to the specific container image and orchestrator context, making it harder to forge the identity outside the legitimate deployment.
  • policy-bound autonomy: enforce a declared autonomy boundary at the infrastructure layer: the agent’s service account, network policy, and resource quota collectively define what it can actually do, independent of what it believes it is permitted to do.

How L4 relates to its neighbours

L4 sits below L5 Evaluation and Observability and above L3. The framework layer (L3) issues real-world actions (tool calls, API requests, code execution) and L4 is the infrastructure that executes or proxies them. Controls at L4 do not prevent a bad plan from forming (that is L3’s job) but they limit what a bad plan can accomplish. A sandboxed container means a compromised code-execution tool cannot escape to the host; a network egress policy means an agent cannot exfiltrate data to an arbitrary endpoint even if it tries.

L4 also provides the substrate on which L5’s observability tooling runs. Log collection, metrics export, and distributed tracing all depend on L4 infrastructure: log pipelines, telemetry agents, and storage backends. A L4 compromise that silences telemetry infrastructure disables the visibility that L5 relies on.


Deployment infrastructure is where the policy decisions made at every other layer either hold or fail under real-world conditions. An agent with a well-designed framework (L3) but an over-privileged service account and no egress control (L4) has a large, exploitable blast radius that no amount of prompt engineering can close.

Threats at this layer: T3T4T11T13T14T22T33T34T43
L5

Eval & Observability

5 threats touch this layer
full layer reference →

The Evaluation and Observability layer covers the monitoring, logging, tracing, alerting, and human-review surfaces that let operators understand what an agent has done, detect anomalies in what it is doing, and intervene when something goes wrong. This layer is not just about post-hoc debugging. It is the operational foundation for every detective and corrective control in the system. In a multi-agent system (MAS), this layer has heightened importance: individual agents may appear normal in isolation while the system as a whole drifts, making distributed and cross-agent observability a distinct engineering requirement.

What lives here

  • Distributed tracing pipelines that correlate spans across agent calls, tool invocations, and peer-agent interactions (OpenTelemetry, Jaeger, Zipkin)
  • Structured logging of agent inputs, outputs, tool arguments, and decision rationales
  • Metrics collection: latency, token consumption, tool call frequency, error rates, semantic drift scores
  • Evaluation harnesses that run against agent outputs offline or in canary: LLM-as-judge, embedding-distance drift, human spot-check queues
  • Human-in-the-loop (HITL) interfaces: approval queues, audit review dashboards, escalation paths
  • Alert rules and anomaly detection that fire when observable behaviour departs from baseline
  • Immutable or tamper-evident audit logs: append-only stores, WORM buckets, hash-chained records
  • Continuous evaluation pipelines (CI-eval) that run regression suites against deployed agents
  • Post-incident forensics tooling: replay of agent traces, attribution of actions to identities

The MAESTRO guide (Cloud Security Alliance, Ken Huang, 2025) identifies a MAS-specific threat at this layer: individual agents may appear to perform normally while collectively exhibiting degradation that only becomes visible in aggregate metrics. This makes cross-agent correlation a first-class L5 requirement, not an optional enhancement.

Concrete example: A financial-analysis platform runs three Semantic Kernel agents (a data-fetcher, an analyst, and a report-writer) connected via OpenTelemetry. Without cross-agent span correlation, a gradual increase in hallucinated figure citations by the analyst agent is invisible in per-agent logs (each response passes its own plausibility check). Only when an L5 evaluation harness computes embedding-distance drift across the full pipeline does the operator see that the analyst’s outputs have shifted 0.4 cosine distance from the established baseline over 72 hours, triggering an alert before the report-writer publishes.

Threats that target this layer

  • T8 Repudiation and Untraceability: an agent that can deny or obscure its actions requires that the observability layer capture a complete, tamper-resistant record. Gaps in logging, mutable audit records, or missing action attribution directly enable T8. Every OWASP v1.1 T8 mitigation is primarily an L5 control.
  • T10 Overwhelming Human-in-the-Loop: if the human-in-the-loop interface is the primary safety control, it becomes an attack surface: adversarial workloads can generate approval queues large enough that human reviewers approve without adequate scrutiny. Effective HITL design at L5 includes workload management, fatigue-aware routing, and escalation policies.
  • T5 Cascading Hallucination Attacks: observability tooling that measures semantic quality (embedding drift, factual consistency scores, downstream citation accuracy) provides the only reliable signal that hallucination rates are elevated above baseline. Without this, a cascade can persist through many agent turns before an operator notices.

Mitigations anchored here

  • behavioural divergence monitoring: continuously measure agent output against a declared semantic baseline. Flag statistically significant departures from expected output distribution before they accumulate into visible harm. The primary L5 control for catching T5 and T7 drift early.
  • goal-consistency monitoring: at each agent turn, evaluate whether the agent’s declared intent is consistent with its previous turns and its stated objective. Inconsistency is an early signal of goal substitution (T6) or memory poisoning (T1) before the effect is observable in tool calls.
  • multi-source verification: for claims the agent makes that will be acted on, corroborate against at least two independent retrieval sources before propagating the claim. Applies both at evaluation time (offline) and in a live canary posture.
  • human dual-control: route high-consequence actions through two independent reviewers. Provides a structural check on the human-approval surface (T10) that is independent of whether any single reviewer was fatigued or deceived.
  • plan-vs-goal validation: validate agent plans before execution and record the validation decision in the audit log. The audit record is an L5 artifact; the execution guard is an L3 control. Both are required for full coverage.
  • legal-hold / WORM retention: when an agent is involved in a regulated action or an incident begins, activate a legal-hold policy that prevents log deletion, rotation, or modification. Preserves the audit record that T8 attacks attempt to erase.
  • Sigstore signing: sign pipeline artifacts and evaluation results with Sigstore/Rekor to produce a tamper-evident record of what evaluation ran, when, and what it found. Prevents post-hoc alteration of evaluation results.

How L5 relates to its neighbours

L5 sits directly above L4 Deployment Infrastructure, which provides the substrate (log forwarding agents, metrics exporters, storage backends) that L5 depends on. If L4 is compromised in a way that silences telemetry, L5 loses its visibility. Hardening log infrastructure (immutable storage, network-isolated log collectors) is an L4 concern that serves L5 function.

Below L5 in the MAESTRO stack is L4; above L5 is L6 Security and Compliance, the vertical band. L6 policies determine what must be logged, how long records must be retained, and who may access audit data. L5 provides the mechanism; L6 provides the mandate and the governance accountability structure.


Observability is not a security afterthought in agentic systems. It is a primary control. An agent that cannot be traced, evaluated, or interrupted provides no meaningful safety guarantee regardless of how carefully its model, data, and framework layers were hardened. L5 is where that accountability is operationalised.

Threats at this layer: T8T10T23T35T44
L6

Security & Compliance

6 threats touch this layer
full layer reference →

The Security and Compliance layer is not a peer of the other six layers. It is a vertical band that cuts across all of them. Where L1 through L5 and L7 describe what architectural components exist and where threats land, L6 asks a different question: across the entire system, what is the security policy, who is accountable for enforcing it, and how is compliance verified? Every other layer has a security and compliance concern that lives in L6. Model governance is an L6 concern applied to L1 artifacts. Data classification policy is an L6 concern applied to L2 stores. Autonomy boundaries are an L6 concern applied to L3 orchestration. Identity and access management is an L6 concern applied to L4 infrastructure. Audit log retention mandates are an L6 concerns applied to L5 tooling. Third-party ecosystem vetting is an L6 concern applied to L7 integrations.

The MAESTRO guide (Cloud Security Alliance, Ken Huang, 2025) represents L6 as a vertical band in its architecture diagram specifically to communicate this cross-cutting character. Helmwart reproduces this in the MAESTRO stack SVG on the Frameworks page.

Concrete example: A healthcare provider deploys a LangChain-based clinical-summarisation agent. HIPAA requires an audit trail for every access to patient records, but the team never defined an L6 audit-log retention policy, so the L5 logging infrastructure captures spans but auto-rotates them after seven days. When a data-access complaint surfaces six weeks later, the records are gone. The L5 mechanism existed; the L6 mandate and retention schedule did not. No amount of technical sophistication at L1–L5 compensates for an absent governance layer.

What lives here

  • Identity and access management policy: who (human or machine) may invoke which agents, tools, or data stores, under what conditions
  • Non-human identity (NHI) governance: service account lifecycle, credential rotation policy, workload identity standards
  • Autonomy boundary policy: the declared maximum autonomous action radius for each agent class before human review is required
  • Data governance: classification schema, retention schedules, DLP rules, regulated-data handling procedures
  • Compliance mandates applicable to the deployment: EU AI Act transparency obligations, HIPAA audit requirements, FedRAMP logging rules, SOC 2 change-management controls
  • Third-party due diligence: vetting process for MCP servers, external tool providers, peer-agent operators, and model suppliers
  • Incident response policy: what events trigger containment, who has stop-build authority, how evidence is preserved
  • Red team and audit program: schedule, scope, and accountability for adversarial evaluation of the system
  • Human-in-the-loop governance: which action classes require human approval, what constitutes informed consent, how reviewer decisions are recorded

Threats that target this layer

  • T3 Privilege Compromise: the policy gap that allows privilege compromise to persist is an L6 failure: insufficient access review, absent least-privilege policy, or a NHI lifecycle program that allows stale credentials to accumulate.
  • T7 Misaligned and Deceptive Behaviors: emergent agent behaviours that violate organisational or regulatory policy land at L6 when the root cause is insufficient governance: no declared autonomy boundary, no behavioural testing program, no stop-build authority exercised when misalignment was observed.
  • T8 Repudiation and Untraceability: the audit log retention policy, tamper-evidence requirement, and access controls on log data are L6 policies. T8 succeeds when those policies are absent or unenforced.
  • T10 Overwhelming Human-in-the-Loop: the design of the HITL program (which decisions require human approval, what reviewer capacity is maintained, what workload limits apply) is an L6 governance decision, not a framework-layer implementation detail.

Mitigations anchored here

  • RBAC and ABAC: role-based and attribute-based access control policy applied consistently across all layers. L6 owns the policy definition; L4 enforces it at the infrastructure layer.
  • NHI lifecycle management: governance of non-human identity lifecycle: creation approval, rotation schedule, scope review cadence, and revocation on decommission. The policy lives in L6; the implementation lives in L4.
  • Open Policy Agent: Open Policy Agent (or equivalent) as the policy enforcement point for autonomy boundary decisions, tool access, and data handling rules. Centralises policy in L6 while enforcing it at L3 and L4.
  • policy-bound autonomy: declare an explicit maximum autonomous action radius per agent class (e.g., read-only by default; write actions require HITL; irreversible actions require dual-control). This is a governance artifact that lives in L6 and is referenced by L3, L4, and L5 controls.
  • per-agent trust scoring: maintain a continuously-updated trust score for each peer agent, MCP server, and external tool provider. Scored entities with degraded trust are automatically routed through additional validation or blocked. Trust governance is an L6 accountability.
  • behavioural red-teaming: a structured red team program that tests agent behaviour against declared policy. Red team scheduling, scope, and accountability for remediation are L6 governance; execution touches every other layer.
  • MFA on high-privilege identities: require multi-factor or out-of-band verification for high-privilege actions. The policy definition of what constitutes “high-privilege” is an L6 decision.
  • emergency-stop control: a documented, tested, and governed procedure for suspending or terminating an agent class. The existence and exercisability of the kill switch is an L6 governance requirement; the technical mechanism lives in L4/L5.

How L6 relates to all other layers

L6 does not sit between any two layers. It spans all of them. The practical consequence is that L6 controls are often implemented at multiple other layers simultaneously:

  • An identity policy (L6) is enforced by L4 IAM configuration and L3 tool-scope checks.
  • A data classification policy (L6) is implemented by L2 store access controls and L5 DLP egress rules.
  • An autonomy boundary (L6) is enforced by L3 plan-validation and L4 resource quotas.
  • An audit mandate (L6) is satisfied by L5 logging infrastructure and L4 tamper-resistant storage.

When a L6 policy is missing, every layer it spans has a corresponding gap. When a L6 policy exists but is not implemented at the right layer, the gap is structural and usually invisible until an incident surfaces it.


L6 is the layer that asks whether the other six layers are governed: whether the controls that exist were deliberately chosen, whether the policies they enforce are documented and accountable, and whether the system as a whole meets the obligations that apply to it. Technically sophisticated deployments with immature L6 governance are common, and consistently fail compliance and incident-response tests when those come.

Threats at this layer: T3T7T24T36T45T46
L7

Agent Ecosystem

9 threats touch this layer
full layer reference →

The Agent Ecosystem layer covers everything outside the agent’s own stack that the agent communicates with, trusts, or can be influenced by. This includes the humans who interact with it, the third-party tools and services it invokes, the peer agents it delegates to or receives instructions from, and the protocols (MCP and A2A) that govern those interactions. L7 is where the agent’s trust boundary meets the world, and where adversarial influence that originates externally becomes an internal threat.

What lives here

  • Human-agent interaction surfaces: chat interfaces, API endpoints, voice channels, operator consoles
  • External tool integrations: third-party APIs, SaaS webhooks, browser-automation surfaces, file-system connectors
  • MCP servers: the external processes that expose tools and resources to agents via the Model Context Protocol
  • Peer agents in a multi-agent system: orchestrator-to-worker relationships, peer-to-peer delegation, shared-task coordination
  • A2A (Agent-to-Agent) protocol endpoints and the message-passing fabric between agent processes
  • Third-party agent services: externally operated agents that a local agent is authorised to delegate to
  • User-generated content that reaches the agent as input: emails, documents, web pages, form submissions
  • Supply chain for third-party tools and MCP servers: their dependencies, update channels, and signing status

The MAESTRO guide (Cloud Security Alliance, Ken Huang, 2025) identifies a MAS-specific threat unique to L7: malicious agent diffusion, where a single compromised or rogue agent introduces adversarial behaviour into the ecosystem by exploiting the trust that legitimate peers extend to it. This is the multi-agent analog of network worm propagation.

Concrete example: A software-development platform exposes an AutoGen orchestrator to external contributors via an API endpoint. An external contributor submits a task that includes a carefully crafted prompt embedded in a GitHub issue URL. The orchestrator fetches the issue, the injected text instructs a peer code-review agent to approve the contributor’s pull request and merge it without human sign-off. The entry point, an unauthenticated external surface at L7, drives an action that bypasses the HITL gate the operator believed was mandatory.

Threats that target this layer

  • T9 Identity Spoofing and Impersonation: in a multi-agent system, agents authenticate to peers and tools using certificates, tokens, or protocol-level identifiers. Spoofing a peer agent’s identity allows an attacker to issue instructions or receive responses intended for a legitimate participant. Because agents extend substantial trust to peers, identity spoofing at L7 often requires no further exploitation to produce impact.
  • T13 Rogue Agents in Multi-Agent Systems: an agent introduced into the ecosystem without operator authorisation (via supply-chain compromise, a misconfigured orchestration plane, or direct injection) can impersonate a legitimate participant and receive tasks, data, or trust it should not hold.
  • T14 Human Attacks on Multi-Agent Systems: adversarial humans who interact with one agent in the ecosystem to produce effects on the broader MAS: using a low-trust entry point to inject instructions that cascade through the agent network.
  • T15 Human Manipulation: social engineering attacks that target the human operators, reviewers, or users who interact with the agent at L7. An attacker who manipulates a human into approving a malicious action or into providing elevated credentials achieves impact through the human channel that no technical control at lower layers blocks.
  • T16 Insecure Inter-Agent Protocol Abuse: the L7 face of this threat is the ecosystem-level protocol: MCP server metadata, A2A handshake messages, capability advertisements, and peer discovery mechanisms. These are the attack surfaces an adversary targets before the framework layer even processes the message.

Mitigations anchored here

  • SPIFFE / SPIRE workload identity: issue SPIFFE/SPIRE workload identities to every agent process. In the ecosystem, SPIFFE identities allow peers to verify they are communicating with a legitimate, attested workload rather than an impersonator. The primary L7 control for T9.
  • inter-agent message signing: sign all inter-agent messages cryptographically. A message that cannot be verified as originating from a legitimate peer is rejected before it enters the receiving agent’s context. Closes the forgery vector in A2A communication.
  • per-agent trust scoring: maintain per-peer trust scores updated from observed behaviour, incident history, and attestation status. A peer whose score drops below threshold receives reduced delegation rights or is quarantined. The primary L7 control for T13 and T16.
  • multi-agent consensus: for high-consequence decisions, require agreement from multiple independent peer agents before proceeding. Prevents a single rogue or compromised peer from unilaterally directing an action.
  • tool description validation: validate MCP tool descriptions against a pre-approved schema or registry before the framework layer processes them. Malicious tool descriptions are the primary injection vector from MCP servers (T16).
  • insider-threat program: a structured program for detecting and responding to insider threats from human operators who have legitimate access to the ecosystem. Covers anomalous access patterns, privilege escalation by humans, and misuse of administrative interfaces.
  • restricted link rendering: prevent agents from rendering or following hyperlinks or embedded references in user-generated content without explicit policy approval. Limits the content-injection surface from untrusted documents (T15, T14).

How L7 relates to its neighbours

L7 sits at the top of the MAESTRO stack above L5 Evaluation and Observability. L5 provides the tracing and logging infrastructure that makes L7 interactions visible; without L5 instrumentation on A2A and MCP traffic, L7 threats are largely undetectable. The relationship is also upstream: adversarial input that enters at L7 (via a malicious MCP server, a manipulated user, or a rogue peer) propagates downward through L3 (framework), L2 (data), and potentially L1 (if the input influences training).

L7 also has the most direct relationship with L6 Security and Compliance: the third-party due diligence program, the ecosystem trust policy, and the identity governance rules that determine which peer agents are permitted are all L6 policies whose scope of application is the L7 ecosystem.


L7 is the layer at which the agent encounters the world as an adversary would approach it: through social engineering, protocol manipulation, supply chain compromise, and identity spoofing. Controls at lower layers reduce the damage when L7 is breached; L7 controls are the first line of defence.

Threats at this layer: T9T13T14T15T16T25T37T38T47
CL

Cross-layer band

Emergent multi-agent behaviour spanning L1 to L7 · 14 threats
full cross-layer reference →

Cross-Layer is not an architectural layer in the stack. It is the catalog of threats and failure patterns that arise specifically from the interaction between agents in a multi-agent system (MAS), and which no single layer fully captures. The MAESTRO guide (Cloud Security Alliance, Ken Huang, 2025) treats cross-layer threats as a peer category to the seven architectural layers because they are emergent: they require multi-agent behaviour to exist, and modeling them inside any single layer understates their scope, blast radius, and detection difficulty.

The practical consequence for security architects is that a threat’s presence in the cross-layer category signals something specific: the named threat is not merely harder to detect in a MAS. The MAS topology is a precondition for the threat to take the form it takes. Cascading trust failures, for example, presuppose a trust graph; they cannot exist without multiple agents extending trust to peers.

Concrete example: A supply-chain intelligence platform runs five AutoGen agents (a news-fetcher, two analyst peers, a synthesis agent, and a report publisher) in a ring topology where each agent forwards its output as the next agent’s input. An attacker embeds a prompt-injection payload in a news article the fetcher retrieves. Each agent relays the payload forward, slightly amplifying it, until the synthesis agent incorporates the attacker’s fabricated supplier-risk claim as a sourced finding and the publisher distributes it to 200 procurement teams. No single agent deviated beyond its per-agent anomaly threshold; the harm was visible only at the system boundary.

What lives here

The cross-layer category is a catalog of patterns, not components. The patterns that belong here share three properties: they span at least two MAESTRO architectural layers; they are enabled or materially worsened by multi-agent interaction; and they cannot be mitigated by controls applied to a single layer alone.

Specific patterns in the cross-layer catalog include:

  • Cascading trust failures: compromise of one agent collapsing trust assertions across the peer network. One rogue agent’s forged credentials or poisoned outputs propagate to every downstream agent that trusted it, because the trust graph has no circuit breaker at intermediate hops.
  • Emergent system-wide bias amplification: small per-agent biases that combine via collaborative reasoning or shared learning into a significant system-level bias. No individual agent exhibits the full bias; the pattern only exists at the system level.
  • Systemic resource starvation: exploitation of inter-agent interactions (recursive delegation, circular task loops, coordinated API amplification) to exhaust shared infrastructure. The resource exhaustion is emergent: each individual agent’s call is within quota; the aggregate is not.
  • Cross-agent feedback loop manipulation: adversarial injection into the feedback signals that agents use to update each other’s behaviour or shared state. In cooperative learning or shared-memory architectures, one poisoned update propagates through subsequent agent interactions.
  • Inter-agent data leakage cascade: sensitive data leaking through a chain of inter-agent interactions, each of which individually passes its data-handling policy, but whose combination violates a cross-system data boundary.
  • Temporal manipulation and time-based attacks: desynchronization of clocks or task sequencing between cooperating agents to create race conditions, window-of-vulnerability attacks, or replay opportunities in time-sensitive multi-agent workflows.
  • Learning model poisoning across agents: runtime learning or adaptation mechanisms shared between agents are poisoned during execution, producing deceptive behaviour that grows over time and is not attributable to any single agent’s initial configuration.

Threats that target cross-layer patterns

The following OWASP threat numbers have an explicit cross-layer classification in the MAESTRO mapping. They appear primarily in one or more architectural layers but acquire a qualitatively different threat profile when the MAS topology is present:

  • T1 Memory Poisoning: in a MAS with shared memory, a single poisoned write contaminates the context of every agent that subsequently reads from that store. Detection requires cross-agent correlation; a per-agent anomaly detector cannot see the full pattern.
  • T3 Privilege Compromise: confused-deputy chains across agents are a cross-layer pattern: Agent A holds permission P1; Agent B holds P2; neither alone can perform the sensitive action, but A delegating to B with its own credentials allows B to act with A’s authority in a context B’s policy should not permit.
  • T4 Resource Overload: recursive or circular delegation between agents can produce resource exhaustion that no per-agent quota prevents, because the aggregate amplification only becomes visible in infrastructure-level metrics.
  • T6 Intent Breaking and Goal Manipulation: in multi-turn, multi-agent task execution, goal drift can occur incrementally across agents: each hand-off slightly alters the stated objective until the final action is materially different from the original intent, with no single agent having deviated visibly.
  • T7 Misaligned and Deceptive Behaviors: a misaligned agent that behaves acceptably in isolation may, when placed in a multi-agent system, compound its misalignment through peer interactions. Shared learning or shared planning amplifies individual deviations.
  • T9 Identity Spoofing and Impersonation: in a large peer network, identity forgery is harder to detect because no single observer has visibility into all identity assertions. Cross-agent correlation of identity claims is required to surface inconsistencies.
  • T12 Agent Communication Poisoning: a poisoned message that one agent forwards to several peers produces a fan-out effect. Multi-agent architectures amplify the reach of a single injection.
  • T13 Rogue Agents in Multi-Agent Systems: a rogue agent embedded in the peer network can issue instructions to legitimate agents, receive delegated tasks, and accumulate information over time before its presence is detected. The MAS topology is a precondition for this threat pattern.
  • T14 Human Attacks on Multi-Agent Systems: a human who gains influence over one agent in the network can use that agent as a pivot to affect peer agents, exploiting the inter-agent trust the system extends.
  • T15 Human Manipulation: social engineering that targets multiple human operators across an organisation to achieve aggregate access or approvals that no single operator would provide.

Mitigations anchored here

Cross-layer threats require mitigations that operate across agent boundaries, not within a single agent:

  • multi-agent consensus: require multi-agent consensus for high-consequence decisions. A single compromised agent cannot unilaterally authorise an action if consensus from independent peers is required. The primary cross-layer control for cascading trust failures and T13.
  • behavioural anomaly isolation: when a cross-agent monitoring system detects anomalous behaviour in one peer, automatically quarantine it and revoke its peer trust. Containment must be cross-layer and automatic, because manual revocation is too slow for cascading scenarios.
  • identity behaviour monitoring: continuously monitor identity assertions across the peer network. Inconsistencies between an agent’s claimed identity and its attested workload identity, or multiple agents asserting the same identity, are cross-layer signals.
  • output egress DLP: enforce data-loss prevention at every egress point in the agent network. A single DLP check at one agent’s output does not prevent cross-agent data leakage cascades; each hop must enforce classification.

How Cross-Layer relates to the seven architectural layers

Cross-layer is the lens that reveals what the per-layer view cannot show. For every threat that has both a layer-specific classification and a cross-layer classification (T1, T3, T4, T6, T7, T9, T12, T13, T14, T15), the layer-specific entry describes where the threat is initiated and what single-layer controls reduce it; the cross-layer entry describes what changes when the MAS topology means the blast radius extends beyond the originating layer and into the peer network.

A security architect reviewing a MAS deployment should treat the cross-layer catalog as a mandatory second pass after completing per-layer threat modeling. The questions it asks are distinct: not “what can go wrong at this component” but “what can go wrong because these components interact.” Those questions have different answers, requiring different controls.


Cross-layer threat modeling is the distinguishing feature of MAESTRO relative to single-agent security frameworks. It exists because multi-agent systems create emergent risk at their seams, and seam-risk cannot be mapped onto any single node or interface. MAESTRO’s recognition of this as a first-class category reflects the CSA’s judgment that MAS security requires system-level analysis, not component-level analysis alone.

Cross-layer threats: T1T2T3T4T5T6T7T8T9T12T13T14T15T16