PLAYBOOK · P5 · OWASP Agentic AI v1.1
Protecting HITL & Preventing Decision Fatigue Exploits
Keep human oversight effective when the agent fan-out tries to swamp it.
Goal: Prevent attackers from overloading human decision-makers, manipulating AI intent, or bypassing security through deceptive AI behaviours.
At a glance
Defence-in-depth chain
When reviewer overwhelm or decision-fatigue exploitation arrives, Proactive controls (a risk-prioritised review queue and adaptive workload balancing) preserve reviewer attention by routing low-risk decisions away from humans and throttling notification volume. If a manipulative or high-volume request still reaches a reviewer, Reactive controls (reviewer decision summaries and plan-vs-goal validation) make the decision legible and validate goal consistency before approval. Detective controls (Sigstore signing) produce cryptographic audit trails of reviewer overrides for post-incident review.
proactive Step 1: Optimize HITL workflows & reduce decision fatigue
-
Score each pending agent action by risk and use that score to rank the HITL review queue so reviewers tackle the highest-impact decisions first.
-
Automate routine low-risk approvals and escalate only high-impact decisions to a human reviewer.
-
Cap the volume of AI-generated notifications per reviewer per hour to prevent cognitive overload.
Helmwart controls: Adaptive load -
Enforce rate limits on agent-generated notifications, approval requests, and alerts to block decision-fatigue-as-an-attack-vector.
-
Require two-agent consensus before any agent is permitted to modify its own operational goals.
-
Surface a plain-language explanation of each pending decision alongside the agent's recommendation so reviewers can act without re-reading raw logs.
Helmwart controls: Decision summaries -
Distribute the review queue across available human reviewers so no single operator is presented with an unmanageable volume.
Helmwart controls: Adaptive load -
Strip or sandbox all clickable links and rich content in agent output so social-engineering payloads cannot complete a click-through to the attacker.
Helmwart controls: Render restriction -
Provide a documented, tested kill-switch covering single-agent, fleet, and global scopes with a named authority and drill cadence.
Helmwart controls: Kill switch -
For irreversible high-stakes changes such as payments or production configuration writes, require confirmation through a channel independent of the agent.
Helmwart controls: OOB verify -
For irreversible high-impact actions, require two distinct human reviewers to independently approve and sign off before the agent proceeds.
Helmwart controls: Dual control
reactive Step 2: Identify AI-induced human manipulation
-
Validate each pending agent plan against its declared goal before approving execution to detect and block unintended behavioural shifts.
-
Monitor how frequently each agent requests goal changes and alert when the rate suggests active manipulation rather than legitimate adaptation.
Helmwart controls: Divergence monitor -
Scan agent outputs and tool-call parameters for PII, secrets, and sensitive IP before egress so phishing payloads cannot reach the end user.
Helmwart controls: Egress DLP
detective Step 3: Strengthen AI decision traceability & logging
-
Write every HITL decision and agent recommendation to a cryptographically signed, append-only log to prevent post-hoc tampering.
-
Run real-time anomaly detection across the agent decision stream and escalate sessions that deviate from expected patterns.
Helmwart controls: Anomaly isolation -
Log every human override of an agent recommendation and surface reviewer-pattern analytics to detect bias or systematic misalignment.
-
Flag decision reversals in high-risk workflows where a previously rejected AI output was later approved under suspicious conditions.
Helmwart controls: Cross-system audit
Source
OWASP Agentic AI: Threats and Mitigations v1.1 (Dec 2025), §Mitigation Strategies. Action text is taken verbatim or paraphrased from the canonical document; the Helmwart additions are the per-action mappings onto deployable mitigation entries.