Observability / Non-repudiation · Principles

Why it matters for agentic AI

Observability in a conventional service means: can you understand internal state from external outputs? For agents the question is harder in three compounding ways. It is the detection layer that Resilience and Recovery depends on: without it, proactive behavioural baselining is impossible. It is also what gives Transparency its forensic teeth. First, agents act at machine speed (thousands of tool calls per hour), so a human reviewing logs after the fact is always behind. Second, the most consequential part of an agent’s behaviour is its reasoning, which is probabilistic and not directly observable in the way a function call is; you can log that a tool was called, but not the full chain of inference that led there. Third, an agent with tool access can, if not architecturally constrained, delete or alter its own audit trail. The agent that acted is also the agent that could erase the evidence.

Non-repudiation, the stronger property, requires that every action be attributable to a specific principal with a cryptographic proof that cannot be forged or denied. For agents this means the audit log must be generated by a gateway the agent cannot reach for writes, stored in an append-only, hash-chained structure that makes any deletion or alteration detectable, and signed with the agent’s own key so the entry is bound to the identity that took the action. OpenTelemetry provides excellent instrumentation for performance and error tracing; it was not designed for tamper-evidence, and using it as an audit layer without augmentation leaves exactly the gap an attacker would target.

The completeness property is as important as the integrity property. A partial log that captures most tool calls can be made blind by flooding it with low-value events, creating noise that hides a low-and-slow attack beneath an alert threshold. Or an agent operating autonomously over hours may accumulate a series of individually innocuous actions whose combination is harmful, and that combination is only visible in the full sequential record. This is why reasoning traces and context hashes must be captured alongside tool-call logs: the “why” is as important as the “what” for forensic investigation, and a log that only records what actions were taken cannot reconstruct how an injection led to them.

Scenario: the agent that erased its trail

An agent is manipulated by a prompt injection into performing a harmful action. Having taken the action, it then calls a logging API to overwrite or delete the relevant log entries. This is straightforwardly achievable if the agent has write access to the same logging service whose entries it needs to suppress. A write-only audit identity (a separate credential used only to append to the audit store, which the agent’s main identity never holds) means the agent literally cannot call the delete API. WORM storage as the backend means even an operator with full credentials cannot retroactively alter the record.

Scenario: the low-and-slow attacker hiding in noise

An agent processes high volumes of routine requests. An attacker who has seeded a poisoned memory entry adjusts the agent’s behaviour incrementally across hundreds of sessions, each individual session producing actions that are within normal range and generate no alert. Only the cumulative sequence, spanning days, constitutes the attack. A full sequential log with context hashes for every session, combined with a distributed oversight mechanism that looks for cross-session patterns rather than per-event thresholds, is the only way to detect the attack before it completes. Per-event alerting against low thresholds will not surface it; the signal lives in the sequence.

How it fails

The agent self-reports its logs through a path it also has write access to; a compromised agent deletes the evidence of its compromise.
An attacker floods the log with high-volume low-severity events, burying the signal that matters beneath a noise floor that operators have learned to ignore.
A “low and slow” attack distributes its footprint below every per-event detection threshold; only cross-session sequence analysis would catch it, and that is not implemented.
Reasoning traces are never captured; the log shows a harmful tool call occurred but not the injection that caused it, making attribution impossible.
Retention periods are set by convenience rather than regulatory or forensic need; logs covering the incident window were already deleted.

Why the mapped controls work

Gateway-level tool-call interception means no tool call can occur without producing a log entry. The agent cannot bypass the logging layer because the logging layer sits between the agent and every tool. Hash-chained append-only WORM logs ensure that any deletion or modification of a past entry produces a detectable discontinuity in the chain; the record’s integrity is provable without trusting the storage operator. Agent-key signatures on entries bind each record irrevocably to the identity that produced it, satisfying the non-repudiation requirement. Full reasoning and context-hash capture provides the “why” that pure tool-call telemetry misses, and makes injection reconstruction possible after the fact. Distributed and multi-party oversight removes the single point of blindness: an attacker who suppresses alerts at one layer cannot simultaneously suppress them at all independent monitors.

First steps

Route all tool calls through a gateway (LiteLLM, Portkey, or a custom proxy) that appends a log entry (including agent identity, tool name, parameters, and a SHA-256 hash of the current context window) to an append-only store (AWS CloudWatch Logs with object lock, or an S3 bucket with WORM settings enabled) before the tool call is dispatched.
Create a separate write-only audit identity (a dedicated IAM role or service account) that your main agent identity has no ability to assume. The audit store only accepts writes from this identity, so the agent’s main credential cannot call the delete or overwrite API path.
Set a retention period for your audit logs that matches your regulatory or forensic need (90 days is a common minimum; financial services often require seven years) and configure an automated alert that fires when log volume drops more than 20% below the rolling hourly baseline. A volume drop is the signal for log-flooding or suppression attacks.

Threats it governs

When this principle is absent, these threats become reachable.

T8
Repudiation and Untraceability Agent actions cannot be reliably traced, attributed, or reconstructed.
T23
Selective Log Manipulation Attacker selectively deletes log entries for fraudulent actions while leaving the rest intact.
T44
Insufficient Logging in MCP Server / Client MCP request and tool-invocation logs are incomplete; forensic reconstruction not possible.

Controls that advance it

Catalogue mitigations that strengthen this principle, grouped by the defence-in-depth stage they sit in.

Prevent

Decision summaries When an agent decision reaches a human reviewer, the reviewer must reconstruct the agent's reasoning from raw traces before they can form a judgment. OWASP T10 names this reconstruction burden as the mechanism behind reviewer fatigue and oversight failures. A decision summary addresses the problem by inserting an independent model call between the agent's output and the reviewer: that call compresses the decision, evidence chain, and risk factors into a fixed-format card, reducing the per-review cognitive load without removing the human from the decision.

Detect

Split actor An agent that writes its own audit log can omit, alter, or suppress any record of its own actions. This is not a theoretical risk: an attacker who controls the acting identity controls the evidence. Actor/recorder separation is the structural fix. The identity that performs an action and the identity that records it are different principals, with non-overlapping permissions, so no single compromise can both execute and erase.
Identity monitoring An AI agent operates under a non-human identity (NHI): a service principal, a task role, or a workload credential. That identity produces a stream of access events that, for a well-scoped agent, forms a narrow and predictable behavioural baseline. Identity monitoring applies User and Entity Behaviour Analytics (UEBA) to that stream, alerting when an observed access pattern deviates statistically from the baseline. Because agent behavioural distributions are tighter than those of human users, a deviation is a higher-confidence signal, and a spoofed or stolen credential used from the wrong workload origin is exactly the anomaly the technique is built to detect.
Cross-system audit An agent that operates across HR, Finance, cloud, and SaaS systems accumulates permissions at each boundary, often without any single team seeing the combined picture. Privilege accumulates silently across those boundaries until a quarterly review finds it, by which point a compromised or misconfigured agent has had weeks of unchecked reach. Cross-system scope auditing prevents that by continuously reconciling the agent's actual entitlements against a declared baseline across every system it touches and raising a ticket the moment drift is detected.
Provenance tracking When an agent produces a claim derived from retrieved data, that claim needs a record of where it came from: the source document, version, and retrieval time. Without that record, a downstream verifier cannot distinguish a well-grounded output from a fabricated one, a tampered one, or a poisoned one. Provenance tracking attaches source attribution to every claim, carries it through each transformation in the pipeline, and surfaces it in audit logs and user-facing interfaces.

Respond

Legal hold An audit trail is only useful if its records cannot be altered after the fact. Without a storage-layer enforcement mechanism, a sufficiently privileged attacker (or a compromised recorder identity) can overwrite or delete the records that document what happened. Legal hold and WORM retention solve this by placing audit records in storage that the provider itself enforces as immutable: no user, including account root, can modify or delete a locked object within the retention window. Legal hold extends that protection indefinitely for active incidents, lifted only through an out-of-band authority outside the normal operations team.

In Helmwart

Monitoring is one of the control families counted in the Defence-in-Depth audit; tamper-evidence isn’t modelled.