← Atlas · Principles Reference in Helmwart

Governance & safety · OECD P3 · EU AI Act Art. 13/50

Transparency / Explainability

Meaningful information about the agent’s operation, capabilities, limits, and decisions is available to stakeholders.

Why it matters for agentic AI

In classical systems, telemetry answers the question “what happened?”: a log of method calls, SQL statements, HTTP requests. For agentic systems that is necessary but not sufficient. Transparency is the public-facing complement to Observability (which is the internal technical layer) and is a precondition for Accountability (which requires that affected parties can interrogate what an agent did and why). An agent does not merely execute instructions; it constructs a reasoning chain from its inputs and then selects actions. Two agents given identical instructions and different context windows will take different actions, and the difference lives entirely inside the reasoning trace, the part that traditional telemetry does not capture. Without that trace, forensic investigation after an incident is guesswork: an investigator can see that the agent sent an email or deleted a record, but cannot determine whether it did so because of a legitimate instruction, a misinterpretation, or an injected command that arrived inside a document three tool-calls earlier. The injection succeeds partly because the reasoning is opaque; there is nowhere for a defender to look.

The security demand here goes beyond compliance visibility. Decision-grade observability means capturing, alongside every tool call: the reasoning that produced it, a hash of the context window at the moment of decision, and the authority under which it was issued. The context hash is particularly important because it is the only way to prove or disprove that a specific injected payload was present in the agent’s context when a suspicious action was taken. Without it, an attacker who poisons the context and then removes the poison from memory can make the injection invisible to any post-hoc review. Transparency is therefore not merely about organisational openness; it is the technical precondition for forensic traceability and, by extension, for accountability.

Opacity also compounds the risk of long action chains. Each step in an agentic pipeline is individually plausible: search a database, compose a message, call an API. It is the sequence and the reasoning that reveals whether the chain is legitimate or has been steered by adversarial input. A security team watching only individual actions will miss the drift; only a team that can inspect the reasoning chain across steps can see the compound picture. This demands that reasoning traces are stored with the same integrity guarantees as security logs: tamper-evident, attributed, and retained long enough to support investigation.

Scenario: the invisible injection

An agent processes customer-submitted support tickets. One ticket contains an injected instruction buried in a markdown table. The agent acts on the injection, forwarding an internal attachment to an external address, and the reasoning trace is never captured. The security team can see the forwarding event in email logs, but cannot determine the cause: they do not know whether it was a genuine misrouting, a model error, or an adversarial injection. Without the context hash, they cannot prove the injected text was present. The investigation stalls, the attacker’s method remains unknown, and the same vector can be reused. Had the reasoning trace and context hash been captured at the moment of the forwarding decision, the injected payload would have been visible and the attack path recoverable.

Scenario: the undocumented capability gap

An agent is deployed to assist customers with account queries. Its operators know it can read account data but have never documented its ability to trigger outbound notifications. A customer asks an unusual question; the agent, reasoning opaquely, decides to send a notification to a third party. Neither the customer nor the operator was aware the agent had this capability; the agent’s limitations were never documented. The harm is minor in this case, but the undocumented capability means there is no gate, no monitoring, and no customer expectation that could have surfaced the gap before deployment. Transparent capability documentation, covering what the agent can do, what it refuses, and what it will do in edge cases, is the control that makes undocumented-capability incidents discoverable before they become consequential.

How it fails

  • Reasoning traces are not captured, so the causal chain from input to action is unrecoverable after an incident.
  • Only “what happened” telemetry exists; “why the agent decided to” is invisible.
  • Context is not hashed at the moment of consequential decisions, making injection-then-removal attacks forensically undetectable.
  • Agent capabilities and limitations are undocumented, so unexpected behaviours are invisible until a user or auditor stumbles on them.
  • Logs are not tamper-evident, so an attacker who can modify memory or storage can expunge the trace of their injection.

Why the mapped controls work

Capturing reasoning traces alongside tool calls converts the agent from a black box into a system whose decision logic can be audited step-by-step. The trace is the primary artefact that makes injection attacks visible: an auditor can walk the reasoning chain and identify the exact input that caused a deviation. Context hashes stored with each decision record make the “injection then removal” attack detectable: the hash proves what was in context at the moment the action was authorised, regardless of what later appears in memory. Real-time action checklists give operators a live view of what the agent is doing, enabling intervention before a long harmful chain completes. Documented agent limitations and failure modes are the transparency control for stakeholders who were never in the loop: customers, regulators, and affected third parties. They form the baseline from which unexpected-capability incidents can be recognised and reported.

First steps

  1. Instrument your agent’s reasoning traces today using OpenTelemetry with a custom span attribute for reasoning_summary and context_hash. Capture a SHA-256 hash of the full context window at the moment of every consequential tool call and store it alongside the trace in your append-only log, so that a post-incident review can prove exactly what was in context when a decision was made.
  2. Produce and publish a capability card for every agent you operate: a one-page document listing what the agent can do, what it will refuse, what data it can access, and the three most likely failure or edge-case behaviours. Make this available to any stakeholder (customer, auditor, internal reviewer) who asks, so that “we didn’t know it could do that” is not a credible response after an incident.
  3. Configure real-time action checklists for high-stakes agent workflows. Use your agent framework’s event hooks (LangChain callbacks, LangGraph streaming, or a custom middleware layer) to push a structured summary of each planned action to an operator dashboard before execution, so that a human reviewer can see the chain as it builds and intervene before it completes.

Threats it governs

When this principle is absent, these threats become reachable.

Controls that advance it

Catalogue mitigations that strengthen this principle, grouped by the defence-in-depth stage they sit in.

Prevent
  • Decision summaries When an agent decision reaches a human reviewer, the reviewer must reconstruct the agent's reasoning from raw traces before they can form a judgment. OWASP T10 names this reconstruction burden as the mechanism behind reviewer fatigue and oversight failures. A decision summary addresses the problem by inserting an independent model call between the agent's output and the reviewer: that call compresses the decision, evidence chain, and risk factors into a fixed-format card, reducing the per-review cognitive load without removing the human from the decision.
  • AI label When an AI agent generates content or proposes an action, users need to know that the source is an AI before they decide to act. Without that signal, users routinely over-trust agent output. AI-source disclosure addresses this by attaching a visible label to every AI-generated item and by requiring explicit confirmation for consequential actions, restoring the critical gap between receipt and acceptance.
Detect
  • Provenance tracking When an agent produces a claim derived from retrieved data, that claim needs a record of where it came from: the source document, version, and retrieval time. Without that record, a downstream verifier cannot distinguish a well-grounded output from a fabricated one, a tampered one, or a poisoned one. Provenance tracking attaches source attribution to every claim, carries it through each transformation in the pipeline, and surfaces it in audit logs and user-facing interfaces.
  • Split actor An agent that writes its own audit log can omit, alter, or suppress any record of its own actions. This is not a theoretical risk: an attacker who controls the acting identity controls the evidence. Actor/recorder separation is the structural fix. The identity that performs an action and the identity that records it are different principals, with non-overlapping permissions, so no single compromise can both execute and erase.
Respond

No catalogued control.

In Helmwart

Overlaps the observability principle; not scored directly.