← Atlas · Mitigations Tier 2 · Real-composable

MITIGATION · m-decision-summary

Reviewer decision summaries — independent rationale at HITL gate

When an agent decision reaches a human reviewer, the reviewer must reconstruct the agent's reasoning from raw traces before they can form a judgment. OWASP T10 names this reconstruction burden as the mechanism behind reviewer fatigue and oversight failures. A decision summary addresses the problem by inserting an independent model call between the agent's output and the reviewer: that call compresses the decision, evidence chain, and risk factors into a fixed-format card, reducing the per-review cognitive load without removing the human from the decision.

Last reviewed 2026-05-12 · Status: published · Evidence →

At a glance

MATURITY
Tier 2
Available off-the-shelf or as a documented pattern, but newer or less broadly proven. Expect integration work and some operational nuance.
PLACES ON
node
Restricted to node kinds: hitl-gate
COVERAGE
1 threat
T10
TRADE-OFFS
LAT
low
COST
medium
UX
low
DEV
medium
Latency · cost · UX friction · dev effort.
TL;DR
  • Before any agent decision reaches the human reviewer, generate a structured review card that includes the proposed action and its irreversibility class, the evidence chain with source citations, the agent's stated confidence and key uncertainties, the policy gates already passed, and any upstream anomaly signal that fired.
  • The card must be generated by an independent critic model call, different system prompt, no shared context with the agent under review, so the summary is a genuine compression of the evidence, not a restatement of the agent's own reasoning.
  • The reviewer sees the card alongside the agent's raw output, never the card alone; the card reduces reconstruction work, it does not replace the underlying evidence record.
  • Track reviewer disagreement-with-summary rate as a coverage signal: a rate above roughly 5 percent indicates the card schema or critic prompt is omitting material content.

How it behaves

Agent decision arrives at HITL queue (raw decision + evidence chain + proposed action)
Independent critic model (different system prompt, no shared context) generates a structured review card before the item reaches the reviewer
Reviewer sees decision, evidence summary, risk factors, recommended action alongside the agent's raw output
Item held for re-summary; repeated failure escalates the gating itself
The card is a decision aid, not a lock. Pair with m-human-dual-control so the human review step is structurally required, not optional.

What it is

A human reviewer at a HITL gate is asked to evaluate a decision the agent made: a proposed action, the evidence the agent cited, and the risk factors in play. To do that evaluation, the reviewer must first reconstruct the agent's reasoning from a raw execution trace. That reconstruction is itself work, and when the review queue is large, it is the work that breaks down first. Reviewers either approve without finishing the trace, or they miss the signal buried three steps into the evidence chain. OWASP T10 names this pattern as the primary mechanism behind oversight failures at scale.

A decision summary places an independent model call between the agent's output and the reviewer. The summariser uses a different system prompt and has no access to the agent's working context, so its output is a genuine compression of the evidence rather than a restatement of the agent's own reasoning. A well-formed summary card includes: the proposed action and its irreversibility class, the evidence chain with source citations, the agent's stated confidence and key uncertainties, the policy gates already passed, and any upstream anomaly signal that fired. The reviewer sees the card alongside the raw output, not instead of it.

The card is a cognitive-load reduction, not a replacement for the evidence record. The enforcement gate is a separate structural control; see m-human-dual-control.

Detection signals

  • Time-per-review. A large sustained drop indicates reviewers are reading the card and skipping the underlying trace, which defeats the independence point on high-irreversibility decisions.
  • Reviewer disagreement-with-summary rate. A rate above roughly 5 percent signals that the critic prompt or card schema is omitting material content and needs re-tuning.

Threats it covers

  • WHY IT HELPS Overwhelming HITL occurs when the volume or complexity of agent decisions makes human review practically impossible, causing reviewers to approve without reading or to miss high-risk decisions under cognitive load. An independent decision summary cuts the per-review reconstruction work to a fixed cost: the reviewer reads a structured card rather than traversing a raw reasoning trace, keeping review thoroughness viable at scale.

Principle coverage

Defence-in-Depth stage: Prevent — and it advances:

  • Human Oversight (HITL / HOTL) Decision summaries make human oversight more reliable in practice: by reducing the per-review reconstruction burden to a fixed-format card, reviewers can maintain review thoroughness as queue volume grows, rather than approving under cognitive load.
  • Observability / Non-repudiation A structured review card generated from the agent's decision trace is itself an observable artefact: it makes the reasoning behind each queued decision inspectable by both the immediate reviewer and any auditor examining the decision record after the fact.
  • Transparency / Explainability Decision summaries surface the agent's reasoning, evidence chain, and stated confidence in a form reviewers and auditors can read directly, making the basis for each decision transparent rather than buried in a raw execution trace.

Design & governance principles (open design, economy of mechanism, accountability, …) are architectural, not advanced by a single placed control.

Implementation options

There is no off-the-shelf product that ships an independent HITL decision summariser as a managed service. The options below cover the independent-critic model pattern (self-build), managed annotation queues that host the resulting review card, and LLM-as-judge tooling whose rationale output maps onto the summary card schema. Deploy the independent-critic pattern first; the annotation queue and judge options handle where the card lands and how it is scored.

Independent critic via Anthropic API Call a separately-prompted model with its own system prompt and no shared context with the agent under review, to compress the agent's decision, evidence chain, and proposed action into a structured review card before the item enters the HITL queue.

Why choose it: The canonical implementation of the control. Independence is structural: the summariser must be a separate model call with its own system prompt and no access to the agent's working context. Enforce this by using a separate API key or project for the summariser, keeping the two calls auditably distinct. Use claude-haiku-4-5 for low-latency summary cards (roughly 300 to 500 ms); escalate to claude-sonnet-4-5 for high-irreversibility decisions. Feed the agent's raw decision JSON as the user turn; constrain the output to a typed JSON schema via tool_use to guarantee parseable card structure.

More details:

LangSmith annotation queues Push the generated review card as a run into a LangSmith annotation queue; assign a named reviewer pool; capture structured feedback (rubric score and freeform rationale notes) per item via the SDK or the LangSmith UI.

Why choose it: Best as the queue layer that receives and displays the summary card produced by the independent critic. LangSmith annotation queues present one run at a time in a focused view, allow rubric feedback keys (categorical or continuous) and free-text notes, and expose a three-layer SDK for programmatic queue population. Pairwise Annotation Queues support side-by-side display when reviewers must compare two agent decisions. The Python and TypeScript SDK lets you push cards from the same pipeline that calls the independent critic, keeping queue population automatic. Fatigue routing is not built in; pair with m-adaptive-workload for pool-level assignment.

More details:

DeepEval GEval Define a custom GEval metric with evaluation criteria that match your review-card rubric; run it against agent outputs using a separate evaluator model; surface the per-metric score and reason string to reviewers as a pre-generated summary.

Why choose it: Best when the independent critic also needs to function as a quantitative scorer. GEval produces a numeric score and a reason string per criterion, which map directly onto the confidence and uncertainties fields of a summary card. The evaluator model is configurable independently of the model under review (OpenAI, Anthropic, Ollama, Azure, Gemini); per-span scores and metric reasons are accessible in the trace-tree interface and via the API. Use GEval as the critic layer when you need calibrated scores alongside qualitative summaries, and feed both into an annotation queue for human review. This is not a queue or display layer; pair with LangSmith or a custom UI for reviewer assignment.

More details:

Argilla Self-hosted annotation platform that accepts records with model-generated suggestions pre-attached; reviewers see the AI suggestion alongside each record and accept, correct, or override it.

Why choose it: Best when you need a fully self-hosted, open-source review queue with no vendor dependency and full data sovereignty. Argilla's suggestion system (rg.Suggestion with confidence score) lets you attach the independent critic's summary as a pre-filled suggestion on each record before it enters the review queue; reviewers see the AI-generated card and annotate on top of it. The RankingQuestion type supports comparative ordering of multiple agent outputs when reviewers must choose between decision variants. Fatigue routing and mandatory-break enforcement are not native; implement them in the layer that pushes records into Argilla datasets. Requires self-hosted deployment via Docker or Kubernetes.

More details:

Structured ReviewCard schema (self-build) Define a typed ReviewCard interface covering action, irreversibility, evidence chain, confidence, uncertainties, gates passed, and anomaly flags; render it as a fixed-format reviewer checklist in your own HITL UI.

Why choose it: The only option that gives full control over card schema, display layout, and risk-factor breakdown, and the only option that integrates the summary card directly into an existing HITL workflow UI without a third-party annotation platform. Build the card schema as a TypeScript interface; generate it from the independent critic's tool_use response; render it as a structured checklist with per-field reviewer confirmation (approve, override, escalate). The self-build cost is concentrated in two places: the card schema design (roughly one sprint) and the divergence-audit loop that flags when the critic's summary misses material content present in the raw trace (ongoing). Best for teams that already have a custom HITL UI and do not want to route decision data through a vendor platform.

More details:

Trade-offs

  • The independent-critic call adds roughly 300 to 500 ms per HITL item at Haiku-class latency, and 800 to 1200 ms at Sonnet-class; run the critic call asynchronously and pre-generate the card before the item is dequeued to eliminate reviewer-visible delay.
  • Cost is one additional model inference per HITL decision. At Haiku pricing this is negligible for queues under 10,000 items per day; at Sonnet pricing for high-irreversibility decisions, budget the per-item cost explicitly before enabling for high-volume queues.
  • Integrating the critic call into an existing HITL pipeline requires a defined ReviewCard schema, a queue that can display structured cards, and an audit path that records the card alongside the agent trace. These are build costs, not ongoing running costs.
  • Reviewer trust calibration requires ongoing attention: surface explicit language in the review UI stating that the summary was generated by a separate model and that reviewers should verify against the full trace for high-irreversibility decisions.

When NOT to use

  • Do not deploy in fully automated pipelines with no human reviewer; the critic call adds latency and cost with no benefit when no one reads the card.
  • Do not use for low-complexity, uniform action types where the reviewer can absorb the raw output faster than a card; templated, repetitive agent actions with a narrow decision surface do not benefit from independent summarisation.
  • Do not substitute the summary card for the full trace on high-irreversibility decisions; the card reduces reconstruction work, it does not replace the evidence record.

Limitations

  • The independent critic cannot detect information the agent suppressed before its output was generated; the critic summarises what the agent chose to surface, not what the agent internally computed. Pair with m-actor-recorder-split to ensure the raw execution trace reaches the reviewer independently of the agent's own output.
  • Summary quality degrades when the agent's decision involves novel action types or evidence formats not well-represented in the critic's training distribution; monitor the disagreement-with-summary rate as a proxy for coverage drift.
  • LangSmith annotation queues and Argilla are annotation platforms, not HITL enforcement gates; a reviewer can close a queue item without acting on the summary. The enforcement gate is a separate structural control (m-human-dual-control); the summary card is a decision aid, not a lock.
  • A self-build card schema requires ongoing maintenance as the agent's action schema evolves; a card schema that does not cover a new action type silently produces incomplete summaries. Version the ReviewCard interface alongside the agent's tool schema.

Maturity tier reasoning

  • Tier 2 fits because every component is production-available and actively maintained: Anthropic API for the critic call, LangSmith annotation queues, DeepEval GEval, and Argilla are all documented composition, not research-stage tooling.
  • What keeps this out of Tier 1 is the absence of a standardised ReviewCard schema and a production benchmark for summary quality in agentic HITL settings; every deployment defines its own card structure and omission-detection threshold.
  • The Trust and Safety pattern of independent moderation summary before human review is Tier 1 mature in content-moderation platforms; its formal application to agentic AI HITL gates is established practice without a published standard.

Last verified against upstream docs: 2026-05-30.