EVIDENCE TRAIL
Reviewer decision summaries — independent rationale at HITL gate
Verbatim excerpts from the upstream sources cited on the mitigation page, with what each source does and does not prove. The core upstream mandate appears in OWASP Agentic AI v1.1 Playbook 5, which names "AI-assisted explanation summaries for human reviewers" as a proactive HITL control verbatim. MITRE ATLAS AML.M0029 independently endorses "dedicated audit agents" to assist human approval. Note: the MDX citation of Madaan et al. 2023 as evidence for "independent critique" overstates that paper's scope — Self-Refine uses single-model self-feedback, not separate model instances.
Last cross-checked against upstream sources: · 9 sources
References
Each entry shows what the source supports and what it does not prove.
OWASP Agentic AI — Threats & Mitigations v1.1
§T10 Overwhelming Human-in-the-Loop — Description
"Overwhelming Human-in-the-Loop (HITL) occurs when attackers exploit human oversight dependencies in multi-agent AI systems, overwhelming users with excessive intervention requests, decision fatigue, or cognitive overload. This vulnerability arises in scalable AI architectures, where human capacity cannot keep up with multi-agent operations, leading to rushed approvals, reduced scrutiny, and systemic decision failures."
Supports: Defines the threat this control directly addresses: decision fatigue caused by cognitive overload in HITL review queues. Establishes "rushed approvals and reduced scrutiny" as the harm that reviewer decision summaries are designed to prevent.
Does not prove: Does not itself prescribe structured review cards or independent summarisation as the remedy — that prescription appears in the Playbook 5 section of the same document.
OWASP Agentic AI — Threats & Mitigations v1.1
§Playbook 5: Protecting HITL & Preventing Decision Fatigue Exploits — Step 1 Optimize HITL Workflows & Reduce Decision Fatigue (Proactive)
"Implement AI-assisted explanation summaries for human reviewers. Provide clear, concise AI decision explanations to help reviewers make faster, more informed decisions."
Supports: Verbatim upstream mandate for the pattern this control implements. Names "AI-assisted explanation summaries for human reviewers" as a proactive HITL control, aligning directly with the independent summarisation pass described in m-decision-summary.
Does not prove: Does not specify that the summariser must be a model separate from the agent under review, nor does it define the structured review-card schema. Those design constraints are Helmwart additions not present in this source.
OWASP Agentic AI — Threats & Mitigations v1.1
§T10 Overwhelming Human in the Loop — Mitigation (threat table)
"Develop advanced human-AI interaction frameworks, and adaptive trust mechanisms. These are dynamic AI governance models that employ dynamic intervention thresholds to adjust the level of human oversight and automation based on risk, confidence, and context."
Supports: Names "dynamic intervention thresholds … based on risk, confidence, and context" as the lever that should toggle automation vs. human oversight — the same threshold logic that determines which items reach the HITL gate where decision summaries are shown.
Does not prove: T10's table-level mitigation language addresses the gate-routing question, not the summary format shown to the reviewer. Adjacent rationale, not identical.
OWASP Top 10 for Agentic Applications 2026
§ASI09 Human-Agent Trust Exploitation — Prevention and Mitigation Guideline 5 "Adaptive Trust Calibration"
"Adaptive Trust Calibration: Continuously adjust the level of agent autonomy and required human oversight based on contextual risk scoring. Implement confidence weighted cues (e.g., "low-certainty" or "unverified source") that visually prompt users to question high-impact actions, reducing automation bias and blind approval."
Supports: Names automation bias and blind approval as the harms that structured reviewer aids mitigate. Establishes confidence-weighted cues as the mechanism that prompts independent reviewer scrutiny — the same goal as a well-formed review card.
Does not prove: The cue described is a UI badge on a recommendation, not an independently generated rationale summary. Helmwart's control goes further by generating a separate-model summary rather than annotating the agent's own output.
OWASP Top 10 for Agentic Applications 2026
§ASI09 Human-Agent Trust Exploitation — Prevention and Mitigation Guideline 4
"Allow reporting of suspicious interactions: In user-interactive systems, provide plain-language risk summary (not model-generated rationales) and a clear option for users to flag suspicious or manipulative agent behavior, triggering automated review or a temporary lockdown of agent capabilities."
Supports: Distinguishes between a plain-language risk summary and model-generated rationales, acknowledging that the agent's own rationale cannot be trusted in adversarial conditions. This is the upstream justification for using an independently-prompted summariser rather than the agent's self-report.
Does not prove: The guideline recommends against model-generated rationales for user-facing suspicious-interaction reports. Helmwart's control applies the independent-generation principle to all HITL review items, a generalisation beyond this specific adversarial-interaction context.
MITRE ATLAS AML.M0029 — Human In-the-Loop for AI Agent Actions
AML.M0029 — description
"Systems should require the user or another human stakeholder to approve AI agent actions before the agent takes them. The human approver may be technical staff or business unit SMEs depending on the use case. Separate tools, such as dedicated audit agents, may assist human approval, but final adjudication should be conducted by a human decision-maker."
Supports: Explicitly names "dedicated audit agents" as legitimate tools to assist human approval — the closest MITRE ATLAS endorsement for an independent summariser model. Establishes that final adjudication must remain with the human, which is the core constraint this control enforces.
Does not prove: Does not specify that the audit agent must be a separately-prompted model or use a structured review-card schema. Those constraints are Helmwart-specific.
MITRE ATLAS AML.M0021 — Generative AI Guidelines
AML.M0021 — description
"Guidelines are safety controls that are placed between user-provided input and a generative AI model to help direct the model to produce desired outputs and prevent undesired outputs. Guidelines can be implemented as instructions appended to all user prompts or as part of the instructions in the system prompt. They can define the goal(s), role, and voice of the system, as well as outline safety and security parameters."
Supports: Frames system-prompt guidelines as safety controls that shape model output — the mechanism by which the independent summariser's behaviour is constrained. The summary-prompt engineering described in m-decision-summary is an instance of this pattern.
Does not prove: Describes input-side guidelines for a single model, not the coordination of two models (agent + summariser). Does not address the review-card schema or the separation-of-models requirement.
NIST AI 600-1 — Generative AI Profile (NIST AI RMF)
MANAGE 4.2 — Suggested Action MG-4.2-003
"Use visualizations or other methods to represent GAI model behavior to ease non-technical stakeholders understanding of GAI system functionality."
Supports: Recommends tools that translate model behaviour into human-readable form to support non-expert review — the rationale class from which decision summaries emerge. Establishes a NIST-level expectation that organisations will deploy such aids.
Does not prove: The MDX cites MANAGE-4.2 as naming "tools that assist human reviewers" specifically; the actual MANAGE-4.2 heading is about continual improvements via stakeholder engagement, and MG-4.2-003 addresses stakeholder comprehension broadly rather than HITL reviewer tooling specifically. The connection is conceptually valid but the section attribution is approximate.
Madaan et al. 2023 — Self-Refine: Iterative Refinement with Self-Feedback
Abstract
"Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement."
Supports: Demonstrates that critique and feedback passes improve LLM output quality — the academic foundation for using a second model pass (the summariser) to produce a higher-quality, more accurate representation of the agent's decision than the agent's raw output alone.
Does not prove: Self-Refine uses the same model as generator and critic; it does not use a separately-prompted independent model. The MDX's claim that this paper provides evidence for "independent critique" is an overstatement — the paper demonstrates iterative refinement with self-feedback, not independence across model instances. The independence constraint in m-decision-summary is not supported by this paper directly.