← Atlas · Principles Partial in Helmwart

Resilience & failure · S&S separation of privilege · NIST AC-5

Separation of Duties

No single actor can complete a sensitive operation alone. Split it so two independent parties (or checks) are required.

Planner proposes · no creds Executor narrow task token Verifier policy · isolated a single injection can’t cross all three principals
No single agent decides, executes, and records. Split the duties across principals.

Why it matters for agentic AI

Separation of duties exists because unilateral authority over a complete sensitive operation is a control failure waiting to happen, whether through error, coercion, or fraud. It is the governance principle that the Lethal Trifecta architectural decomposition operationalises, and that Least Privilege scoping enforces at the credential level. Banks split the functions of authorising and executing a payment across different people precisely so that one compromised or mistaken employee cannot complete the operation alone. The principle is centuries old in accountancy and governance; NIST AC-5 codified it for information systems. Agentic AI breaks it in a way that is invisible unless you are looking for it: an agent that autonomously decides, parameterises, executes, and records an action has already violated separation of duties, even if no single human is involved at all.

The agentic failure is structurally identical to a single employee who originates, approves, executes, and then falsifies the audit record of a fraudulent transaction. The employee isn’t more capable than two employees; they’re more dangerous because the check is gone. An agent that both proposes a payee and submits the payment, or both requests an escalation and self-approves it, has removed the structural check that separation of duties is designed to provide. The solution is not to slow the agent down with human approval for every action. The goal is to ensure that propose, approve, and execute are held by distinct principals with distinct, non-transferable credentials.

The agentic pattern that implements this is the planner/executor/verifier split. The planner holds no execution credentials and cannot call any tool that modifies state; it only produces a structured plan. The executor holds a narrow task token minted for this plan’s specific actions; it cannot deviate from the approved plan steps. The verifier, isolated from the execution path with no dependency on the other two, evaluates whether the plan conforms to policy before the token is minted. A single injection that compromises the planner produces a malicious plan that the verifier must independently approve; the injected planner cannot grant itself execution authority.

Scenario: the self-approving finance agent

A finance automation agent identifies a payee from an invoice, calculates the amount, constructs the payment instruction, and submits it to the payments API, all in a single task with no external check between identification and submission. A crafted invoice redirects the payee field to an attacker-controlled account. The agent’s confidence is high; its chain-of-thought is coherent. There is no moment at which a second principal evaluates the specific combination of (this payee, this amount, this account number) before funds move. Splitting “propose a payment” (producing a structured payment proposal with a read-only task token) from “submit a payment” (held by a distinct executor that requires a verifier’s signed approval token) breaks the chain at the point where the injection matters: the payee redirect cannot survive independent review by a party that never saw the poisoned invoice.

Scenario: the self-escalating sub-agent

An orchestrator delegates a research task to a sub-agent. Midway through, the sub-agent determines that it needs write access to a shared data store to complete the task and issues a tool call that requests elevated credentials from the credential broker. The broker is configured to grant upgrades if the requesting agent’s identity is valid, with no requirement for a second party to attest the need. The sub-agent that requested the escalation receives the elevated token and continues. A credentials launder through the delegation chain: the sub-agent asked for and received its own escalation. Separation of duties here means the escalation request must be evaluated by a different principal by the orchestrator or a human operator before the elevated token is minted, and the requesting agent’s own identity is explicitly excluded from that evaluation.

How it fails

  • A sub-agent both requests and self-approves an escalation; the requesting identity is the same as the approving identity, and there is no structural check.
  • An agent writes its own audit log; a compromised or misbehaving agent can suppress or alter the record of its own actions, defeating the audit function that separation of duties relies on for accountability.
  • Credentials launder transitively through the delegation chain; the planner’s intent token is forwarded unchanged to the executor, which uses it to act beyond the approved plan.
  • The verifier is implemented as part of the same codebase or identity as the executor; a compromise of the executor also compromises the verification step.
  • A single agent holds the trifecta (untrusted input, sensitive data, and irreversible authority) with no intermediate check, structurally identical to the one-person fraud scenario.

Why the mapped controls work

The four-role architecture (planner/retriever/executor/guardian) is the direct agentic implementation of separation of duties: each role holds credentials scoped only to its function, and no role can exercise the authority of another. A planner that is injected cannot execute, because it holds no execution credentials; an executor that is hijacked cannot modify the plan it was minted to execute, because the approved plan is cryptographically bound to its token. Dual-agent or agent-plus-human attestation with hardware-backed signing for high-stakes actions provides the independent second check at the boundary where the stakes justify it: the signature cannot be forged by the proposing principal, so approval is structurally distinguishable from proposal. A separate write-only audit identity closes the self-falsification path: because the executing agent holds no write access to the audit store, it cannot suppress or alter the record of its own actions; the audit trail is structurally beyond its reach. Intent-bound tokens that cannot cross purposes prevent credential laundering: a token minted for “retrieve document X” cannot be forwarded and used for “submit payment Y,” collapsing the transitivity that makes delegation chains dangerous.

First steps

  1. Identify every agent in your system that both proposes and executes the same sensitive action (payment submission, account modification, bulk data export). Each of these is a separation-of-duties violation; for each one, split the proposing step to a read-only planning agent and route the execution step through a separate executor that requires a verifier’s signed approval before acting.
  2. Create a dedicated write-only audit identity (a separate IAM role or service account) with no read or delete permissions on the audit store, and reconfigure all agent audit writes to use only this identity. The executing agent’s main credential must have no path to the audit store’s overwrite or delete API.
  3. Configure your credential broker to require a second-party attestation for any privilege escalation request. For example, in HashiCorp Vault, use a two-person integrity policy (the required_response_wraparound or an approval workflow via Vault’s control groups) so that a sub-agent cannot obtain elevated credentials without a human or a separate guardian agent countersigning the request.

Threats it governs

When this principle is absent, these threats become reachable.

Controls that advance it

Catalogue mitigations that strengthen this principle, grouped by the defence-in-depth stage they sit in.

Prevent
  • Dual control An AI agent operating with broad authority can propose actions that are irreversible: deleting records, modifying IAM policies, moving funds. A single human reviewer at the approval gate is a single point of failure, one compromised account, one fatigued reviewer, or one successful social-engineering attempt is enough to commit the action. Human dual-control addresses that by requiring two distinct, independent humans to approve before the action commits.
  • Peer consensus A single agent's judgment on a high-impact action can be wrong, manipulated, or compromised. Requiring N of M independent peer agents to agree before the action executes means an attacker or a systematic error must affect the quorum majority, not just one agent, before harm results.
  • Plan check A plan-then-execute agent produces a sequence of steps before acting. If the planner is manipulated, it will emit steps that serve the attacker's goal rather than the user's. Plan-vs-goal validation addresses this by placing an independent validator between the planner and the execution loop: it evaluates each proposed step against the originally-declared goal before the agent is permitted to act on it.
Detect
  • Split actor An agent that writes its own audit log can omit, alter, or suppress any record of its own actions. This is not a theoretical risk: an attacker who controls the acting identity controls the evidence. Actor/recorder separation is the structural fix. The identity that performs an action and the identity that records it are different principals, with non-overlapping permissions, so no single compromise can both execute and erase.
Respond

No catalogued control.

In Helmwart

The lethal-trifecta detection plus the Q1 human-in-the-loop signal already encode the core: no one agent should hold untrusted input + sensitive data + irreversible authority unchecked.