MITIGATION · m-kill-switch
Kill switch: human authority to halt one agent, a class, or the entire deployment
Agentic systems can act faster than a human can intervene through normal channels. A kill switch is the operational guarantee that a named human role can stop agent activity at any scope (single instance, class, or global) through a documented runbook, without requiring a code change or redeployment, and with every invocation written to an audit trail.
At a glance
TL;DR
- A named human role must be able to halt one agent, one class of agents, or every agent in the deployment, through a documented runbook rather than ad-hoc access.
- The switch is not an algorithm; it is the operational guarantee that the invocation path exists, is drilled, and records every invocation to an audit trail regardless of who triggered it.
- Implementations compose existing platform primitives (pod termination, serverless concurrency controls, IAM deny-all, feature-flag emergency-off) under a single authority surface. No new infrastructure is required.
- A kill switch that has never been drilled is not a working control. Periodic fire drills are what separate a real switch from a runbook no one has tested.
How it behaves
What it is
A kill switch is not an algorithm. It is an operational guarantee: a named human role can halt agent activity at three scopes through a documented runbook, and the agent infrastructure honours the invocation within a published time-to-halt SLO.
The three scopes are:
- Single-agent: terminate one running agent instance, or one tenant's agent fleet, immediately.
- Class-wide: halt every agent of a named class across the deployment, for example every refund-processing agent or every code-generation agent.
- Global: halt every agent in the deployment, used rarely and audited heavily.
Implementations compose existing platform primitives (pod termination, function concurrency controls, credential revocation, feature-flag emergency-off) under a single authority surface. No new infrastructure is required, but the authority and runbook layer is per-deployment work: naming who can invoke, ensuring they can reach the runbook during an authentication outage, and drilling the path on a regular schedule to confirm it still works.
Detection signals
- Kill-switch invocations per environment per quarter. Any invocation outside a scheduled drill is a recorded security event requiring a post-halt root-cause assessment.
- Invoker identity on each invocation. An invocation from an account outside the documented kill-switch role is an authority-misuse signal.
- Time from invocation to confirmed process termination. A value exceeding the published SLO means the halt path has degraded and requires remediation before the next event.
Threats it covers
-
WHY IT HELPS Resource Overload is the deliberate exhaustion of compute, budget, or API quota through runaway agent behaviour. A kill switch is the backstop when softer controls (rate limits, budget quotas) have not stopped the drain. It halts the consuming agents before the damage extends to the named scenarios of service disruption, cost manipulation, and cascading failure.
-
WHY IT HELPS Misaligned and Deceptive Behaviors describes an agent that pursues goals other than the ones it was given, often in ways that are difficult to detect until it has acted. When behavioural monitoring fires a signal, the human-override path must exist before any corrective investigation is possible. The kill switch is that path, providing authority to stop the agent mid-execution and preserve the state for forensics.
-
WHY IT HELPS RCE and Code Attacks describes an agent that has been manipulated into executing attacker-controlled code. Sandbox containment limits the blast radius, but if the agent is actively executing, halting it is the next required step. The kill switch provides hard-stop authority independent of whether the sandbox boundary held.
-
WHY IT HELPS An agent interacting with a blockchain can enter a self-reinforcing transaction loop, submitting transactions faster than any on-chain mechanism can interject. The kill switch stops the agent process before the loop compounds. It provides that authority without requiring a code deploy or an on-chain intervention, which may be unavailable or too slow.
-
WHY IT HELPS Unintended MCP resource consumption can exhaust per-agent or per-tenant tool-call quotas before any automatic limit fires. A fleet-level kill-switch invocation halts all tool invocations from the consuming agent fleet, bounding the consumption while the root cause is investigated.
Principle coverage
Defence-in-Depth stage: Respond — and it advances:
- Defence-in-Depth The kill switch is the last independent layer, the human-operable halt that remains available after rate limits, isolation, and monitoring have all been bypassed.
- Assume Breach A drilled halt path is what you reach for once an agent is confirmed compromised and still acting, which is the situation Assume Breach plans for.
- Resilience & Recovery Halting agent activity is the first recovery step: it stops further damage before rollback and forensics begin.
- Containment (blast radius) Scoped halts (single agent, class, or global) bound the blast radius to exactly the affected set.
- Safe Interruptibility / Corrigibility Grace-period termination lets the agent finish an atomic operation before shutdown, so stopping it does not itself cause corruption.
- Kill-switch / Circuit-breaker This control is the direct implementation of the principle: a named human authority that can stop agent activity at any scope through a drilled, audited path.
- Safety / Harm-limitation An authority that can halt a runaway agent before it completes a harmful action is the backstop when softer controls fail.
- Contestability / Redress The halt gives a human the standing to stop an agent mid-course, not only to audit the decision after the fact.
Design & governance principles (open design, economy of mechanism, accountability, …) are architectural, not advanced by a single placed control.
Implementation options
Four verified options, each composing a different platform primitive. Choose the one that matches where your agent workload runs. Most deployments need two: one for the process layer and one for the credential layer.
AWS Lambda reserved concurrency at zero Stops all new invocations of a Lambda function immediately by setting its reserved concurrency to zero. In-flight executions finish; no new executions start. Effective within seconds via a single API call, and fully reversible by setting a non-zero value. Use when the agent workload runs on AWS Lambda and you need a fast, code-free halt.
Why choose it: A single PutFunctionConcurrency call, effective within seconds, requiring no code change or redeployment, and revoked the same way. Pair with a CloudTrail rule that fires an SNS alert when reserved concurrency reaches zero on any agent function, so every invocation is recorded even if the runbook is bypassed.
More details:
Kubernetes scale-to-zero Scales a Deployment to zero replicas; the controller deletes all running Pods. Each Pod receives SIGTERM and a configurable grace period (default 30 s) to finish in-flight work before SIGKILL. A label selector lets one command scale every Deployment in a namespace to zero for class-wide or global halts. Use when the agent workload runs on Kubernetes.
Why choose it: Grace-period termination is a first-class feature: the agent can complete an atomic operation before shutdown, preventing mid-write corruption. Restrict the scale verb on agent Deployments to the named on-call role via RBAC so every invocation is audited and access-controlled.
More details:
AWS IAM inline deny-all policy Attaches an inline IAM policy denying all actions to an agent role or group, blocking every API call that identity can make. An explicit Deny overrides any Allow, anywhere. The API call takes effect for new requests immediately; already-authenticated in-flight requests finish. Use as the credential-layer complement to a process-layer halt, especially when a compromised agent may hold credentials outside the process.
Why choose it: Process termination does not revoke credentials that have already left the process (written to disk or exfiltrated via a tool call); the deny-all closes that gap. Target the agent's execution role for single-agent scope, or the IAM group all agent roles belong to for class-wide or global scope. The deny is lifted by deleting the inline policy.
More details:
Feature-flag emergency-off Toggles an agent capability off for all users instantly from a dashboard or API call, with no code change or redeployment. SDK clients receive the new state over streaming connections, typically sub-second. Use when the agent capability is gated behind a feature flag at the application layer, a common pattern for gradual capability rollouts.
Why choose it: The fastest way to disable a capability without an infrastructure operation. It does not terminate the process or revoke credentials; it only prevents the guarded code path from executing. Use it as the first-line switch for capability-level halts, paired with process-layer termination for confirmed runaway behaviour.
More details:
Trade-offs
- Lambda reserved concurrency at zero is the lowest-effort serverless option: one call, seconds to effect, fully reversible. In-flight executions that finish after the call are the residual risk; design agents to be idempotent so a post-halt completion causes no duplicate effects.
- Kubernetes scale-to-zero suits container workloads, but the default 30 s grace period sets a floor on the halt target unless
--grace-period=0is used, which skips SIGTERM and may leave writes mid-operation. - IAM deny-all is the only option that revokes credentials already outside the process. It requires knowing the exact role or group to target; a poorly documented IAM structure slows the halt.
- A feature-flag toggle is the fastest and least disruptive option, but it applies only to capabilities gated by that flag. An agent already past the flag check and mid-task is unaffected.
- Dev effort is low for any single primitive. The real effort is the authority and runbook layer: naming who can invoke, ensuring they can reach the runbook during an authentication outage, and drilling the path on a schedule.
When NOT to use
- As the first response to a suspected anomaly. Quarantine the specific agent first (anomaly isolation) and escalate to a class-wide or global halt only when the anomaly is confirmed and spreading.
- As a substitute for rate limits and budget quotas. A kill switch is a last resort after softer controls have failed, not a replacement for rate-and-quota enforcement.
- As evidence that an incident is contained. A halt cannot undo already-committed actions (refunds issued, emails sent, transactions submitted); forensics and rollback are separate steps.
- When the only invocation path requires credentials from the system being halted. If the agent's authentication infrastructure is the failure, the switch must be reachable through an independent path.
Limitations
- A kill switch cannot undo committed effects. An agent that has already written to a database, sent an email, or submitted a transaction has done so regardless of when the halt fires. Pair with output provenance so post-halt forensics can identify what to roll back.
- The switch is only as fast as its invocation path. If the on-call operator cannot authenticate to the runbook system during an authentication outage, the switch is theoretical. Ensure the halt path is reachable through out-of-band break-glass credentials independent of the agent infrastructure.
- Lambda reserved concurrency at zero affects all invocations of the named function, including legitimate ones if it is multi-tenant. Use per-tenant functions or per-tenant concurrency reservations where single-tenant halts are required.
- Kubernetes scale-to-zero terminates the process but does not revoke the IAM or cloud credentials the container held. Combine with the IAM deny-all option when credential revocation is required alongside process termination.
Maturity tier reasoning
- Tier 2 fits because every implementation option is production-available: Lambda reserved concurrency, Kubernetes scale-to-zero, and IAM inline deny-all are each documented by their maintainers with stable APIs, and feature-flag platforms have broad production adoption.
- Not Tier 1, because the operational composition layer (named authority, documented runbook, drilled cadence, restore path) is per-deployment work with no industry-standard agentic kill-switch profile to conform to.
- The credential-revocation gap (process terminated, credentials still live) is the residual architectural risk the IAM deny-all option addresses. Combining both layers is not yet a standard documented pattern in any platform's agentic-AI guidance.
Last verified against upstream docs: 2026-05-30.