Human oversight and escalation

When must a human approve?

Route by consequence and reversibility, not by confidence alone. Confidence, moderation, and novelty signals may influence the route; they do not prove an action is safe.

No HITL

Autonomous action with sampled oversight

Bounded and reversible. Draft a reply, classify a ticket, or issue a refund below an approved low-value limit.

1 approver

Single-review HITL

Material but reversible. Send a customer response, change workflow state, or approve a moderate-value refund.

2 approvers

Dual-control HITL

High-impact or irreversible. Release funds, delete records, grant privileges, or transfer regulated data.

Stop

Quarantine or refuse

Unsafe or unevaluable. Flagged output, missing evidence, policy lookup failure, or a saturated review queue. No action commits.

Flow diagram

Each proposed action is screened and assigned a route. Low-risk bounded actions may proceed under sampled oversight; medium- and high-impact actions enter true HITL before commit. Unsafe actions exit as refusals or quarantine. Hover over a box for a plain-language explanation; select it to open the control that implements that step.

The escalation flow — step detail

Every proposed action is routed before execution. Only routes labelled HITL require a human approval before the action can commit.

1
Agent proposes an action without executing it yet. The proposal includes the action, target resource, requested authority, and supporting evidence.
2
Routing signals are collected — confidence estimates, evidence completeness, policy lookup status, novelty, and action consequence. Low confidence or missing evidence can force refusal or review, but high confidence never authorises a high-impact action by itself. Implementation: fail-closed refusal.
3
Policy and moderation checks run — block prohibited output, policy violations, or unsafe content before routing an executable action. Implementation: output moderation gates.
4
Choose an execution route — use consequence, reversibility, authority, data sensitivity, and novelty to route the proposal to bounded autonomous action, single-review HITL, dual-control HITL, or refusal. Implementation: risk-prioritised review queue.
5
For HITL routes, create the review package before review — present the action, evidence chain, relevant policy result, confidence signal, and risk reason in a structured card. Implementation: reviewer decision summaries.
6
The assigned route determines human involvement:
- Bounded autonomous action — no HITL event; action is logged and sampled for oversight.
- Single-review HITL — one human approves a material but reversible action.
- Dual-control HITL — two humans approve high-impact or irreversible action. Implementation: dual-control approval.
7
Assign HITL work to an available reviewer — track queue depth and fatigue indicators; do not silently downgrade a required review when human capacity is unavailable. Implementation: adaptive workload balancing.
8
Allowed actions are signed and logged — record the autonomous route or human approval, actor, policy version, and time. Implementation: Sigstore keyless signing + separation of actor and recorder.
9
User-facing actions disclose the AI role where applicable — labelling is relevant for communications or interactions, not every internal workflow transition. Implementation: AI-source disclosure UI.
10
Action commits or refuses — reviewer decisions, sampled autonomous outcomes, and reversals feed calibration without bypassing required approvals. Implementation: HITL feedback-loop calibration.

HITL flow controls at a glance

m-fail-closed

Routing signal failure

Refuse when required evidence or a reliable policy decision is unavailable.

m-risk-prioritized-queue

Queue routing

Match review requirements to consequence and reversibility; keep bounded autonomy distinct from HITL.

m-output-moderation

Moderation gate

Classify proposed output independently; quarantine flagged decisions before release.

m-human-dual-control

High-risk review

Require two-person approval for irreversible or high-consequence actions.

m-decision-summary

Review package

Give reviewers the proposed action, evidence, policy result, and routing rationale before approval.

m-adaptive-workload

Fatigue routing

Track reviewer fatigue indicators; route to fresh reviewers; mandate breaks when thresholds trip.

m-ai-disclosure-ui

Conditional disclosure

Label the AI role when the resulting action or communication is user-facing.

m-sigstore

Signed approval

Bind approvals and release records to verifiable signatures.

m-actor-recorder-split

Independent log

Separate the actor from the recorder so decision history is harder to alter.

m-hitl-feedback-loop

Calibration

Capture overrides and reversals to adjust thresholds and policies.

Design principles

Decision fatigue is the failure mode. The T10 Overwhelming HITL scenarios are about saturation, not capability. Adaptive workload balancing + risk-prioritised queues together address it.
Legibility is the prerequisite for accountability. A reviewer who can't read the agent's reasoning isn't really in the loop. Reviewer decision summaries produce a structured surface; the reviewer reads that, not raw traces.
The route boundary makes HITL credible. Not every action needs human approval, but actions that require approval must not silently fall back to autonomy. Define bounded auto-action separately from single-review and dual-control routes. See the risk-prioritised review queue.
Cryptographic integrity strengthens procedural accountability. Sign reviewer approvals and use append-only logs with a separate recorder identity. Sigstore keyless signing + separation of actor and recorder.

When HITL is unavailable

Off-hours, surges, and vendor outages mean HITL queues do not always have a reviewer. The program must declare in advance what happens when the queue saturates:

Fail closed by default. Action does not commit; the agent returns a refusal with the saturation reason in the audit trail. Per the fail-closed refusal pattern.
Queue-depth alarms feed back into the agent's rate limits. If reviewers are slammed, the agent should slow itself. See rate limits and quotas.
Capacity limits are explicit. Declare reviewer-hours per shift, max queue depth per tier, and the SLA at which the program degrades. These belong in the runbook, not the source.

Feedback loop into agent calibration

Reviewer overrides and decision reversals are captured as signals in reviewer decision summaries and risk-prioritised queues. HITL feedback-loop calibration closes the loop: override events are batched, analysed for systematic patterns, and fed back into agent calibration: prompt updates, tool-scope policy changes, and divergence-monitor threshold tuning. Each calibration cycle requires human sign-off on the pattern report before any agent change is deployed.

Policy framing

This HITL program is an engineering interpretation related to the ACM Europe Technology Policy Committee's May 2025 policy brief (see Governance primer) and its proposal for alignment oversight. It is not text of Article 14 or of the brief. Helmwart applies the concept by making agent actions legible to reviewers; the linked flow controls, logging, and calibration measures provide that evidence.

Human oversight and escalation where review is required, and where it is not