What changes in an agent loop
In a chatbot, sensitive data leaks through one channel (the reply), which a human may review before it is sent. In an agent, the output surface is the union of every external action the model takes: tool-call parameters, vector-store queries, memory writes, peer-agent messages, and log payloads. Most of those channels are never inspected by a human. The risk compounds when the agent holds over-broad credentials: a confused deputy with a leaked service-account key can reach data stores the operator never intended it to touch, writing exfiltrated content to a low-visibility side channel. Detecting leakage therefore requires monitoring outbound tool calls and memory writes, not just model responses. The canonical control is least-privilege credential scoping combined with egress inspection on every tool integration.
For the full definition, prevention checklist, and detection guidance, read OWASP's Sensitive Information Disclosure page →. This page only adds the agentic angle and the bridge into Helmwart.
Mitigations
An agent can invoke any tool it has access to, constrained only by its own reasoning. If that reasoning is manipulated or the agent's permissions are misconfigured, it will call tools it should not. OPA addresses this by placing a policy decision point between the agent and every tool invocation: a Rego policy evaluates the agent identity, the tool, and the parameter envelope before execution proceeds, and the agent cannot reason or argue past the result.
In most deployments, agents authenticate to one another with long-lived bearer tokens or shared secrets. If any one of those credentials is stolen, the attacker has persistent, platform-wide access until someone manually rotates it. SPIFFE replaces that model: each workload is issued a short-lived, cryptographically verifiable identity document, and every connection requires both sides to present one. No long-lived secrets traverse the network, and a compromised credential is worthless within its TTL.
An agent identity that holds broad write authority is a high-value target: compromising its credential gives an attacker persistent, authenticated access to every system that identity can reach. Multi-factor authentication addresses this by requiring a second factor at credential issuance time, so a stolen token is bounded to its issued lifetime and cannot be silently renewed. For non-human identities the second factor is workload attestation, hardware-bound key material, or certificate-backed proof rather than a phone or one-time code.
An agent that operates across HR, Finance, cloud, and SaaS systems accumulates permissions at each boundary, often without any single team seeing the combined picture. Privilege accumulates silently across those boundaries until a quarterly review finds it, by which point a compromised or misconfigured agent has had weeks of unchecked reach. Cross-system scope auditing prevents that by continuously reconciling the agent's actual entitlements against a declared baseline across every system it touches and raising a ticket the moment drift is detected.
Every dataset, document, and external system an agent can reach carries a classification label. The agent's permitted-class set and the tool's permitted-class set are intersected at the moment of every read or write. When the requested data's class falls outside that intersection, access is denied at the seam. This is the data-side complement to least-privilege: it adds a data-sensitivity constraint that role scoping alone does not provide.
An AI agent operating with broad authority can propose actions that are irreversible: deleting records, modifying IAM policies, moving funds. A single human reviewer at the approval gate is a single point of failure, one compromised account, one fatigued reviewer, or one successful social-engineering attempt is enough to commit the action. Human dual-control addresses that by requiring two distinct, independent humans to approve before the action commits.
An agent produces output continuously across multiple channels: user-facing responses, tool-call parameter envelopes, log records, and outbound HTTP requests. Any of those channels can carry sensitive content the agent has retrieved, been fed, or been tricked into including. Output egress DLP places an inspection gate at the boundary so that PII, credentials, and proprietary content are classified and either redacted or quarantined before they leave the trust boundary, regardless of how they got into the output.
An agent running with a permanent high-privilege identity gives an attacker, or a misconfigured agent, broad access for as long as that identity persists. Time-bounded privilege elevation addresses this by issuing a short-lived credential tied to a specific action window: the agent holds elevated access only for the duration it needs, and the issuing platform revokes that access automatically when the TTL expires. This is the just-in-time (JIT) access pattern from PAM practice, applied to non-human identities.
An agent that holds a persistent catalog of invokable tools can reach any of them at any point in its session. If its reasoning is manipulated or its identity is compromised, that persistent surface is fully available to an attacker. Just-in-time tool grants remove the standing surface: a policy broker issues a time-bound, task-scoped grant immediately before the tool is needed and revokes it automatically when the task completes or the window expires.
A Non-Human Identity (NHI) is the service account, machine principal, or formal agent identity under which an agentic system authenticates and acts. When an NHI is provisioned with broad scope, never rotated, and has no named owner, a stolen or leaked credential gives an attacker persistent access for as long as that credential remains valid. NHI lifecycle management treats each agent identity as a first-class governance object: provision narrowly with a declared scope and owner, rotate on a short schedule using platform-native short-lived credentials, audit every authentication and rotation event, re-attest that the identity is still needed, and decommission by deletion when the agent is retired.
An agent's authority is normally bounded only by its own reasoning. If that reasoning is manipulated, or the agent's identity is compromised, it will attempt actions the operator never intended to permit. Policy-bound autonomy addresses this by placing a declarative enforcement point between the agent and every consequential action: a policy engine evaluates the agent identity, the target tool, and the parameter envelope before execution, and the agent cannot reason or argue past the result.
An LLM produces tool-call arguments through generation, not through a type system, and generation is not reliable. The arguments may be wrong in type, out of range, or assembled in a combination that violates business rules. A pre-execution validation gate intercepts the call before it reaches the tool: a schema pass confirms each argument conforms to the declared JSON Schema, and a policy pass confirms the argument combination is permitted for this agent and this action. The tool executes only when both passes clear.
Role-Based Access Control (RBAC) assigns every agent identity a named role that sets the outer limit on what it can reach. Attribute-Based Access Control (ABAC) narrows individual decisions inside that role by evaluating contextual attributes at request time. Used together, they enforce least privilege for non-human identities: the agent can only do what its role permits, and only when the request attributes satisfy the policy.
An agent produces code, configuration files, tool-call payloads, and log records continuously and at a rate no human reviewer can match. Any of those artefacts may contain a live API key, service token, or private certificate, placed there accidentally through model context, or deliberately through prompt injection or context poisoning. Secret scanning places an inspection gate at every agent output seam: regex patterns match known token formats, entropy analysis detects arbitrary high-entropy strings, and validator calls confirm which candidates are live credentials. The CI-secret-scanning pattern is mature; the agentic specialisation is seam placement, moving the scanner from the repository gate to the agent egress point, where artefacts can be intercepted before they reach any downstream system.
An agent identity backed by a long-lived bearer token grants access for as long as that token remains valid. If the token is stolen, logged, or extracted from a running process, the attacker holds working credentials for weeks or months without any further action. Short-lived tokens address this by issuing credentials with a time-to-live measured in minutes or hours, automated and renewed by the platform rather than a human. When a token expires, access ends: the attacker must win the renewal process as well, which requires compromising a harder target than the token itself.
Each tool in an agent's catalog should expose only the methods, resources, and parameter ranges its designated role requires. Over-broad tool surfaces let individually authorised primitives compose into actions no human intended to grant; narrowing the scope at design time reduces both the attack surface and the blast radius of any compromise.
An agent acts on behalf of the user, but nothing in a standard OAuth bearer token records what the user actually approved. If the agent's planning is manipulated, it can invoke tools with parameters the user never sanctioned, while presenting credentials that look valid. Intent attestation fixes this by issuing a short-lived signed token that encodes the exact action and parameter envelope the user authorised, and requiring the resource server to verify that envelope before executing the call.