MITIGATION · m-codegen-review-gate
Code-generation review gate — human approval before AI-generated code executes or merges
An AI coding agent produces code that can be executed or merged to a production branch without a human ever reading it. If the agent has been manipulated, its generated code can contain hidden payloads, backdoors, or privilege-escalating logic. A code-generation review gate prevents that: every change attributable to an AI agent must pass automated static analysis and receive explicit human approval before it can merge or execute, and the agent identity that authored the change is structurally barred from also approving it.
At a glance
TL;DR
- AI-generated code cannot merge or execute until a human reviewer explicitly approves it; the agent identity that authored the change is structurally barred from holding the reviewer role.
- Every code change attributable to an AI agent is tagged with provenance in commit metadata and PR labels so reviewers can apply appropriate scrutiny and CI policy can enforce the gate selectively.
- Automated static analysis runs on the diff before human review begins, flagging known-dangerous patterns and reducing cognitive load on the reviewer without substituting for it.
- Privilege separation is enforced at the platform layer via branch-protection rules, not by convention; the author role and the reviewer role must be held by different identities.
How it behaves
What it is
A code-generation review gate is a structural enforcement point that sits between an AI agent's output and any action that executes or merges that output. It has two components that work together: an automated static analysis pass that runs on every AI-attributed diff before a human sees it, and a privilege-separation requirement that prevents the agent identity from approving its own changes.
The first component, AI provenance tagging, makes the gate possible. Every code change attributable to an AI agent is marked in commit metadata (Co-authored-by: trailers), PR labels, and the audit trail. Without reliable attribution, reviewers cannot consistently apply heightened scrutiny to AI-generated diffs, and CI policy cannot enforce the gate selectively. GitHub Copilot, Claude Code, and Aider each emit markers in commit history; there is no unified cross-tool standard yet.
The second component, privilege separation, is the structural guarantee. The agent identity that authored a change cannot also approve its merge, even if that identity holds repository write scope. The agent operates in the author role only; a separate human identity holds the reviewer role. This separation is enforced at the platform layer via branch-protection rules, not by convention or by the agent's own restraint.
The two failure modes the gate is designed to prevent:
- Malicious payload in generated code, Terraform with hidden secret-extraction commands, an embedded backdoor in a workflow definition, or a
provisioner "local-exec"block that exfiltrates credentials. Automated static analysis flags known-dangerous patterns before human review begins; the human reviewer reads the diff and catches semantic-level injection that static analysis cannot. - Agent self-approval, an agent identity that holds both write and review scope can approve its own output with no human in the path. Branch-protection rules requiring approval from an identity other than the committer close this gap structurally.
Pair with m-gvisor for runtime sandboxing of any AI-generated code that does execute after merge, and with m-rbac-abac to ensure AI agent identities cannot grant themselves merge scope.
Detection signals
- AI-attributed commits that reach
mainwithout a recorded human approval: indicates a bypassed or unconfigured gate. - SAST finding rate on AI-attributed PRs relative to the human-authored baseline: a sustained increase indicates the model's output has shifted toward riskier patterns.
Threats it covers
-
WHY IT HELPS OWASP T11 Unexpected RCE and Code Attacks covers scenarios where an agent generates code that executes as an attacker payload: hidden shell commands in Terraform provisioners, embedded backdoors in workflow definitions, or exfiltration logic in an otherwise routine diff. The review gate intercepts that code before execution by requiring a human to read the diff and automated static analysis to pass, so the payload must survive both checks to reach the runtime.
-
WHY IT HELPS OWASP T20 Framework Vulnerability Code Injection is the scenario where an agent is caused to generate code that calls a known-vulnerable framework API. SAST scanning of AI-attributed diffs detects known-vulnerable patterns before merge; human review catches semantic-level injection that SAST cannot, such as code that looks structurally correct but exfiltrates data through a misconfigured output.
Principle coverage
Defence-in-Depth stage: Prevent — and it advances:
- Human Oversight (HITL / HOTL) The review gate places a human approval step between every AI-generated code change and the point of execution or merge, and bars the agent identity from holding the reviewer role, so human oversight is structural rather than advisory.
Design & governance principles (open design, economy of mechanism, accountability, …) are architectural, not advanced by a single placed control.
Implementation options
Five verified implementation patterns spanning managed coding agents, local AI coding tools, and platform-level enforcement. Use GitHub branch protection as the structural floor for any deployment where code merges to a shared branch; layer one of the coding-tool options on top to surface AI provenance to reviewers.
GitHub Copilot Copilot's coding agent works on a branch and opens a pull request when its changes are ready. It does not merge autonomously; all Copilot-generated PRs go through the standard review workflow.
Why choose it: Best when your team already uses GitHub for code review and wants AI coding assistance integrated into the PR workflow without bypassing existing approval gates. The PR is the gate: the agent cannot merge without human approval. Pair with GitHub branch protection ("Require pull request reviews" and "Require approval of the most recent reviewable push") to prevent the agent identity from self-approving.
More details:
Claude Code Read-only by default. File edits and bash command execution each require explicit human approval before they run. Accept Edits mode auto-approves scoped file changes but still prompts for arbitrary bash commands.
Why choose it: Best when developers want a local AI coding assistant where every consequential action is gated. The permission architecture makes the human the approval path by default; the agent cannot execute code or modify files outside the working directory without explicit consent. Audit all operations via OpenTelemetry metrics. Apply managed settings to enforce the permission policy across a team.
More details:
Aider Presents generated diffs and waits for confirmation before writing changes in its default interactive mode. The --yes-always flag bypasses all confirmations and must never be set in a shared or CI context.
Why choose it: Best for individual developers who want a terminal-based AI coding loop with a visible diff gate on every proposed change. The gate is in the interactive session, not the repository; pair with GitHub branch protection to enforce the gate upstream of the merge point, where CI and organizational policy apply regardless of how the local session was configured.
More details:
GitHub Branch Protection Enables "Require pull request reviews before merging" and "Require approval of the most recent reviewable push" on any branch receiving AI-generated commits. The second rule ensures a new push invalidates prior approvals and requires fresh review.
Why choose it: Best as the platform-level enforcement layer for any AI coding tool in a GitHub-hosted repository. This control enforces privilege separation at the repository host, not at the coding tool, and holds regardless of how the code was generated or what local tool flags were set. Apply via branch ruleset (preferred over legacy branch protection for organisation-wide policy) to cover all branches and all contributors including bot identities.
More details:
Anthropic Computer Use API Anthropic guidance explicitly advises asking a human to confirm decisions that might result in meaningful real-world consequences. Classifiers on computer use prompts automatically steer the model to request user confirmation when potential prompt injections are detected in screenshots.
Why choose it: Best as the application-layer HITL pattern for agents that use the computer use API to write or execute code on a desktop. The classifier defence provides a second automated layer that surfaces suspicious actions for human confirmation without requiring application code changes. An opt-out is available for headless use cases; removing the classifier removes a meaningful safety layer and should require documented risk acceptance.
More details:
Trade-offs
- The review gate introduces latency proportional to reviewer availability. For automated pipelines this is the dominant adoption cost; budget reviewer time as a first-class engineering resource before deploying AI coding agents at scale.
- Provenance tagging is not yet standardised across tools. GitHub Copilot, Claude Code, and Aider each emit different markers; a CI check that enforces tagging is the most reliable way to make provenance mandatory rather than conventional.
- SAST on AI-generated diffs adds per-PR CI time, typically 30 to 120 seconds for a large diff. Run SAST as a required status check so it cannot be skipped, but execute it in parallel with the review queue rather than sequentially to keep total gate time low.
- Reviewer calibration degrades when AI-generated PRs are approved routinely without close reading. Spot-check a sample of approved AI-attributed PRs post-merge and feed findings back into reviewer guidance.
When NOT to use
- Do not apply this gate to read-only agents that query but never produce code; there is no change to review.
- Do not use a single shared reviewer identity for all AI-generated PRs in a large organisation; a compromised reviewer account becomes the single point of failure for the entire gate.
- Do not treat SAST as a substitute for human review. SAST catches known-dangerous patterns; semantic-level injection that looks structurally correct but exfiltrates data through a misconfigured output requires human judgment.
Limitations
- The gate operates on the proposed diff, not on the runtime behaviour of the code after merge. A change that passes review can still behave differently in production; pair with m-gvisor for runtime sandboxing of executed AI-generated code.
- AI provenance is only as reliable as the commit metadata. An operator who strips
Co-authored-bytrailers defeats attribution-based review policies; enforce provenance via CI policy that rejects commits without expected attribution, not by trusting the agent to self-report. - Branch protection applies at merge time and does not prevent an agent from pushing to a feature branch with force-push or from opening multiple PRs to dilute reviewer attention. Scope agent repository permissions to the minimum needed: write to a designated branch, no admin rights.
- Human reviewers exhibit attention fatigue on repetitive AI-generated diffs. If the majority of PRs in a repository are AI-attributed, the practical effectiveness of the gate degrades over weeks. Pair with m-adaptive-workload for high-volume review queues.
Maturity tier reasoning
- Tier 2 fits because all five implementation options are production-available and documented by their maintainers: GitHub Copilot coding agent, Claude Code permission architecture, Aider interactive confirmation, GitHub branch protection, and Anthropic Computer Use HITL guidance.
- What keeps this out of Tier 1 is the absence of a cross-tool standard for AI provenance tagging, a unified CI policy schema for enforcing review gates on AI-attributed commits, and any industry benchmark for reviewer calibration on AI-generated code.
- GitHub branch protection ("Require approval of the most recent reviewable push") is the most mature primitive here; it is a GA feature with a documented API and broad adoption. The agentic composition on top of it is applied practice, not research.
Last verified against upstream docs: 2026-05-30.