EVIDENCE TRAIL
gVisor application kernel sandbox
Verbatim excerpts from the upstream sources cited on the mitigation page, with what each source does and does not prove. The phrase "application kernel" comes directly from gVisor's own documentation; the agentic-AI framing is Helmwart's — the OWASP Threats & Mitigations v1.1 document names "sandbox execution" as the T11 countermeasure but does not prescribe gVisor specifically.
Last cross-checked against upstream sources: · 8 sources
References
Each entry shows what the source supports and what it does not prove.
gVisor — Architecture Guide: Security Model
§Goals: Limiting Exposure
"gVisor's primary design goal is to minimize the System API attack vector through multiple layers of defense, while still providing a process model."
Supports: Canonical upstream statement of gVisor's threat model: reduces the host kernel attack surface for untrusted code — the exact risk profile of AI-generated code execution.
Does not prove: Does not reference AI agents, LLMs, or agentic systems. The security goal is expressed in generic OS-isolation terms.
gVisor — Architecture Guide: Security Model
§Principles: Defense-in-Depth
"No system call is passed through directly to the host. Every supported call has an independent implementation in the Sentry, that is unlikely to suffer from identical vulnerabilities."
Supports: Verbatim confirmation that gVisor's Sentry reimplements every syscall independently — it is not a passthrough filter. This is the structural basis for the isolation boundary this mitigation relies on.
Does not prove: Addresses host kernel protection only; does not address network-layer or filesystem side-channels between co-located sandboxes.
google/gvisor — README (GitHub)
README — Opening definition
"gVisor provides a strong layer of isolation between running applications and the host operating system. It is an application kernel that implements a Linux-like interface. Unlike Linux, it is written in a memory-safe language (Go) and runs in userspace."
Supports: Source-of-truth definition used in the MDX vendorClaim field. Confirms the "application kernel" framing and the memory-safety advantage (Go vs. C), both referenced in the mitigation page.
Does not prove: README does not discuss agentic AI, prompt injection, or multi-tenant execution scenarios.
gVisor Blog — "Safe Ride into the Dangerzone: Reducing attack surface with gVisor"
Main body — "Attack Surface Reduction" and "Core Components"
"The document conversion process no longer has access to the Linux kernel. Instead, it only has access to the gVisor kernel (in the Sentry). … gVisor sits between a container and the Linux kernel and plays both roles: from the container's perspective, gVisor acts as a kernel, but from Linux's perspective, gVisor is just a regular application."
Supports: Concrete applied example of sandboxing untrusted document-processing code with gVisor. Directly analogous to the AI code-exec use case: in both cases, code of unknown provenance runs in the Sentry, not against the host kernel.
Does not prove: Use case is document conversion, not AI-generated code. The threat actor model differs (user-supplied documents vs. LLM-generated scripts).
gVisor Blog — "Multi-Agent gVisor Isolation (MAGI)"
Main body — "Limitations Acknowledged"
"gVisor is necessary, but not sufficient without proper policies governing when sandboxing is applied."
Supports: Direct published precedent for multi-agent gVisor isolation: separate gVisor sandboxes per agent, defense-in-depth across agent components, millisecond start time enabling per-invocation isolation. Confirms gVisor is production-viable for agentic workloads.
Does not prove: MAGI is a reference deployment, not a specification or standard. The "necessary but not sufficient" caveat is acknowledged — gVisor alone does not enforce policy about when sandboxing is applied.
gVisor Blog — "Scaling Agentic-RL Sandboxes to the Millions with gVisor at Tencent"
Abstract / opening section
"Today, we run millions of gVisor sandboxes daily for Agentic-RL training in production."
Supports: Production-scale evidence that gVisor is viable for agentic AI workloads at high volume. Directly supports the MDX claim that gVisor is "Tier 1 (production-canonical)" and the maturity tier reasoning.
Does not prove: Use case is reinforcement-learning training loops, not interactive tool-bus execution. Syscall profile may differ from production agent code-exec workloads.
Google Cloud — Cloud Run Container Contract
§Container Sandbox — "gVisor Implementation (First Generation)"
"If you use the first generation execution environment, the Cloud Run containers are sandboxed using the gVisor container runtime sandbox."
Supports: Confirms gVisor is the default sandbox for Cloud Run Gen1, substantiating the MDX independentEvidence claim about Google Cloud production deployment. Cross-validates the MDX text "Default sandbox for Google App Engine, Cloud Run, and Cloud Functions."
Does not prove: Cloud Run Gen2 uses full Linux compatibility without gVisor — the gen1/gen2 split is material context the MDX does not surface.
OWASP Agentic AI — Threats & Mitigations v1.1
§T11 Unexpected RCE and Code Attacks — Description and Mitigation
"Unexpected RCE and Code Attacks occur when attackers exploit AI-generated code execution in agentic applications, leading to unsafe code generation, privilege escalation, or direct system compromise. … Restrict AI code generation permissions, sandbox execution, and monitor AI-generated scripts."
Supports: Canonical OWASP definition of T11. Verbatim "sandbox execution" is the upstream mitigation text that this control implements. Establishes that sandboxing AI-generated code execution is a recognised OWASP countermeasure.
Does not prove: Does not name gVisor or any specific sandbox technology. "Sandbox execution" in the OWASP text is a category of control, not a prescriptive implementation choice.