← Atlas · Principles Reference in Helmwart

Governance & safety · EU AI Act · OECD P5

Contestability / Redress

Those affected by an agent’s decisions can challenge them and seek correction; operators can override and roll back.

Why it matters for agentic AI

Contestability is the security principle for the aftermath. Every other agentic security principle (zero trust, least privilege, transparency, accountability) is designed to prevent or detect a harmful event. Contestability accepts that harmful events will sometimes occur despite those controls and asks: when they do, can the effect be reversed, can the cause be identified, and can a human exercise override authority over the system? Those three capabilities (rollback, attribution, and override) are not soft governance aspirations. They are the operational requirements of incident response for agentic systems, and their absence means an organisation cannot recover from a compromise, cannot attribute a harmful action to its source, and cannot stop a runaway agent pipeline except by taking the entire system offline.

The reason contestability has particular security weight in agentic contexts is that the chain of delegation creates a layer of indirection between the human who bears responsibility and the agent that took the action. A compromised sub-agent, a runaway orchestrator, or a poisoned delegation chain can all produce harmful outcomes in which the nominal human authority was present (there was an authorisation chain) but the action was not what the human intended. In that scenario, contestability is the mechanism by which the human can say “that was not authorised” and have the system support that claim: the delegation record shows the scope was exceeded, the rollback procedure undoes the effect, and the emergency-stop terminates the chain before further harm occurs. Without these mechanisms, “the agent did it” is not a recoverable situation. It is a dead end.

The connection to incident response is direct. Agentic incident response is categorically different from classical incident response because the malicious or erroneous actor may be a running process inside the organisation’s own infrastructure, acting under valid credentials, producing outputs that are individually within policy. Identifying a compromised agent, halting it, attributing its actions, and rolling back its effects requires playbooks specifically written for agentic compromise scenarios, not adaptations of playbooks written for intrusions by external attackers. Contestability as an architectural property is what makes those playbooks executable: it is the set of mechanisms the playbooks depend on.

Scenario: the uncontestable customer decision

An agent makes a loan eligibility determination. The customer is declined. The customer disputes the decision. The operator investigates and finds: the reasoning trace was not retained, the context used at the time of determination is no longer accessible, and there is no rollback mechanism for the declined status. The operator cannot explain the decision, cannot demonstrate it was correct, and cannot reverse it through a defined process. Under EU AI Act requirements, this is a compliance failure. Under a contestability framework, the determination would have been logged with its reasoning trace and the context hash, the declined status would be reversible through a defined override procedure, and the operator would have an investigation path that could produce a substantiated explanation or a correction.

Scenario: the runaway delegation chain

An orchestrator agent encounters a malformed tool response and enters a loop, repeatedly calling an external API in a pattern its circuit breakers were not configured to catch. The operations team wants to stop it. There is no emergency-stop capability that can terminate a running agent pipeline without stopping the entire service. The team’s only option is to revoke the agent’s API credentials, but the agent has cached a token and continues for another several minutes. The absence of an explicit override mechanism (a kill-switch callable at the pipeline level) turns a recoverable incident into an extended unauthorised operation. An incident-response playbook that included a named emergency-stop procedure, tested in a drill, would have ended the incident in under a minute.

How it fails

  • No override or emergency-stop capability exists at the pipeline level, so stopping a runaway agent requires disrupting the entire service.
  • Actions taken by the agent cannot be rolled back because no rollback procedures were designed for the affected resources.
  • There is no incident-response playbook for agent compromise, runaway behaviour, or delegation-chain hijacking, so the team improvises under pressure and misses steps.
  • Reasoning traces and context records are not retained, so the cause of a contested decision cannot be reconstructed after the fact.
  • Override authority is not explicitly assigned to a named role, so no individual has the clear standing to halt the pipeline during an incident.

Why the mapped controls work

Override and emergency-stop capability at the pipeline level is the difference between an incident that is contained and one that continues until infrastructure fails. It must be implemented at the infrastructure layer, not as a model-level instruction, so it cannot be bypassed by the agent’s reasoning. Rollback procedures close the blast radius after the fact: their value depends entirely on having been designed and tested before an incident, because the window for effective rollback is narrow and shortens with every passing second. Incident- response playbooks for agent compromise, runaway, and delegation-chain compromise are the operational documents that translate override and rollback capabilities into executable steps: they name the roles, the decision criteria, the sequence of actions, and the evidence to collect. Together, these controls make contestability real rather than aspirational. They give the organisation the actual mechanisms to exercise the “challenge and correct” right that the governance frameworks require.

First steps

  1. Write and test a runbook for one specific incident type this week (“runaway agent making repeated external API calls”), naming the exact command to invoke the pipeline-level stop, the role authorised to do it, and the first three actions to take afterwards; schedule a 30-minute tabletop to verify the runbook works end-to-end.
  2. For every consequential action your agents take (payments, account changes, external messages), identify whether a rollback procedure exists, document it in a reversibility register, and for those with no rollback path, add a mandatory human-approval gate before the action runs.
  3. Enable structured reasoning-trace logging (e.g. via LangSmith, Langfuse, or OpenTelemetry traces) for every agent decision that touches regulated data, and verify that a logged trace contains enough context to reconstruct the decision independently of the model: the input, retrieved context, and declared intent hash.

Threats it governs

When this principle is absent, these threats become reachable.

Controls that advance it

Catalogue mitigations that strengthen this principle, grouped by the defence-in-depth stage they sit in.

Prevent
  • OOB verify An agent that can propose payments, update banking details, or modify production configuration is, by construction, a manipulation surface. If the only thing standing between a proposed change and its execution is the agent's own UI, a successful prompt injection or RAG poisoning attack requires no additional steps. Out-of-band verification breaks that dependency by routing a one-use confirmation code through a channel that is structurally separate from the agent's primary interaction channel, so an attacker who controls the agent's context cannot complete the approval without also compromising the user's registered secondary device.
Detect

No catalogued control.

Respond
  • Kill switch Agentic systems can act faster than a human can intervene through normal channels. A kill switch is the operational guarantee that a named human role can stop agent activity at any scope (single instance, class, or global) through a documented runbook, without requiring a code change or redeployment, and with every invocation written to an audit trail.
  • Legal hold An audit trail is only useful if its records cannot be altered after the fact. Without a storage-layer enforcement mechanism, a sufficiently privileged attacker (or a compromised recorder identity) can overwrite or delete the records that document what happened. Legal hold and WORM retention solve this by placing audit records in storage that the provider itself enforces as immutable: no user, including account root, can modify or delete a locked object within the retention window. Legal hold extends that protection indefinitely for active incidents, lifted only through an out-of-band authority outside the normal operations team.

In Helmwart

Relates to the kill-switch and resilience principles; not scored directly.