← Atlas · Mitigations Tier 2 · Real-composable

MITIGATION · m-mem-validation

Memory content validation — a write-boundary gate on what enters the agent's memory store

An agent's memory store is a persistent surface: anything written to it can be retrieved by any agent, in any session, for the lifetime of the corpus. Memory poisoning exploits that persistence by writing adversarial content that steers the agent's reasoning long after the attacker has gone. Write-boundary validation prevents this by running every candidate memory write through schema, policy, and provenance checks before it is committed. Content that fails any gate is rejected and never reaches the store.

Last reviewed 2026-05-12 · Status: published · Evidence →

At a glance

MATURITY

Tier 2

Available off-the-shelf or as a documented pattern, but newer or less broadly proven. Expect integration work and some operational nuance.

PLACES ON

edge · node

Restricted to node kinds: shared-memory

COVERAGE

4 threats

T1 · T27 · T28 · T49

TRADE-OFFS

LAT

low

COST

low

DEV

medium

Latency · cost · UX friction · dev effort.

TL;DR

Validate at the write boundary, not the read boundary. Once poisoned content lands in memory it can be retrieved by any agent, in any session, for the lifetime of the corpus.
Three synchronous gates run before every commit: schema validation (Zod / Pydantic), policy enforcement (OPA: tenant boundary, content classifier, source-provenance permission), and provenance verification (signed source tag that chains to a trusted root).
For RAG agents, a fourth inline check compares the candidate embedding's cosine distance from the per-topic cluster centroid against a calibrated baseline. Anomalously distant embeddings are quarantined for review, not committed.
Rejection at any gate produces a structured ValidationError and an audit log entry; the rejected payload is never serialised to the store.

How it behaves

Agent or pipeline requests a write to short- or long-term memory (vector store, KV store, conversation history, inter-agent message bus)

Parse against declared schema, then evaluate OPA policy (tenant boundary + content classifier + provenance permission), then verify signed source-tag attribution chain, then (RAG only) embedding-distance outlier check against per-topic centroid

Payload committed to the store with signed provenance metadata

Payload rejected: audit.log('memory-write-rejected', { gate, reason, source, userId }); calling agent receives structured ValidationError

Fail-closed at the schema gate: a malformed payload is a harder rejection than a policy violation. Never serialise a rejected payload to the store regardless of which gate fired.

What it is

Agent memory is a persistent data store: conversation history, long-term vector embeddings, structured fact caches, and inter-agent message buses that survive beyond a single session. Any agent that reads from that store trusts its contents. Memory poisoning exploits that trust by writing adversarial content, such as a false pricing rule, a fragmented privilege-escalation instruction, or a subtly corrupted policy document, so that future retrievals steer the agent toward attacker-chosen outcomes. The attack is effective precisely because the poisoned content looks like legitimate memory; the agent has no mechanism to distinguish it from honest writes.

Write-boundary validation is the structural fix. It places a validation pipeline at every memory write seam, running each candidate payload through a sequence of synchronous checks before the write is committed. The checks cover three independent dimensions: schema (is the payload structurally valid?), policy (does this write comply with tenant boundaries, content classification, and source-provenance permissions?), and provenance (does the declared source trace to a trusted root?). For vector-store agents, a fourth inline check compares the candidate embedding's cosine distance from the per-topic cluster centroid against a calibrated baseline, quarantining writes that are anomalously distant rather than committing them. Rejection at any gate produces a structured error and an audit log entry; the payload is never serialised to the store.

The principle that follows from this structure is that the write boundary is the highest-leverage poisoning surface. Once adversarial content lands in memory it can be retrieved by any agent, in any session, for the lifetime of the corpus. The read boundary, where output filters and retrieval ACLs operate, is a second line of defence but not a substitute: by the time a retrieval check fires, the poisoned record already exists and may have influenced other writes or been cited in outputs the agent produced before the detection fired.

Detection signals

Validation rejection rate per source. A sustained rise from a single source is the signature of an active injection campaign.
Embedding-distance outlier rate. A rising proportion of writes flagged as anomalous indicates systematic corpus manipulation attempts.

Threats it covers

T1 Memory Poisoning −2 severity steps

WHY IT HELPS Memory Poisoning is the injection of adversarial content into an agent's short- or long-term memory so that future retrievals steer the agent toward attacker-chosen outcomes. Write-boundary validation removes the attacker's ability to commit that content: a candidate write that fails schema, policy, or provenance validation is rejected before serialisation, so it cannot influence any subsequent retrieval.
T27 Vector Database Poisoning with Malicious Smart Contract Data −2 severity steps

WHY IT HELPS Vector store poisoning with malicious financial-instrument data requires the attacker to commit adversarial embeddings to the shared corpus. Schema checks reject structurally malformed payloads; embedding-distance outlier detection flags vectors that are anomalously distant from the trusted cluster centroid; provenance verification rejects writes that cannot be traced to a trusted source. All three gates must fail for a poisoned vector to land.
T28 RAG Data Exfiltration −1 severity step

WHY IT HELPS RAG data exfiltration is partially addressed at the write boundary by requiring every document written to memory to carry a verifiable provenance tag. Those tags are auditable at retrieval time and support forensic investigation of what was accessed and by whom. Write-boundary validation does not prevent retrieval-side exfiltration; pair with m-vector-acl for the retrieval boundary.
T49 Semantic Drift in Embeddings −1 severity step

WHY IT HELPS Semantic drift is the gradual corruption of a corpus through many small writes, each individually plausible. Embedding-distance outlier checks slow this process by flagging candidate writes whose embeddings are anomalously distant from the trusted cluster centroid, routing them for human review rather than committing them. The control does not eliminate slow-drift poisoning that stays within statistical thresholds; it raises the cost and slows the rate.

Principle coverage

Defence-in-Depth stage: Prevent — and it advances:

Memory & RAG Integrity Write-boundary validation is the primary enforcement mechanism for memory integrity: every candidate write is checked against schema, policy, and provenance before it is committed, so adversarial content that fails any gate never enters the corpus and cannot be retrieved by any agent.

Design & governance principles (open design, economy of mechanism, accountability, …) are architectural, not advanced by a single placed control.

Implementation options

Five verified implementation options covering the three gate layers (schema, policy, provenance) plus a managed content classifier and a semantic plausibility gate. Every deployment should implement at minimum the schema and policy gates together; they cover structural and permission checks with no vendor dependency. Add the content classifier for ingestion pipelines that accept external documents. Add the semantic gate only for writes claiming high-trust provenance.

Zod / Pydantic Parse every candidate write against a declared schema before any other check. Both libraries expose a non-throwing parse path (safeParse / model_validate) that returns structured errors. Sub-millisecond; in-process; no vendor dependency.

Why choose it: Use as the structural floor for every deployment. It is the cheapest gate and a prerequisite for the policy gate. Zod v3 ships with static type inference so the schema doubles as a TypeScript type. Pydantic v2 is the standard for Python LLM frameworks. A parse failure short-circuits the chain before the payload reaches the store layer.

More details:

Open Policy Agent Decouple policy decisions from enforcement with Rego policies for tenant-boundary, content-classifier, and source-provenance-permission rules. Accepts arbitrary JSON; returns structured decisions in under 1 ms with the Wasm build. Policies are versioned code.

Why choose it: Best when policy complexity warrants a dedicated engine. Tenant boundaries, content classifiers, and provenance permissions are the three rules that matter for memory-write validation; OPA allows each to be authored, reviewed, and versioned independently of the agent code. The Wasm build eliminates the network round-trip and supports hot-reload without redeploying the agent.

More details:

Open Policy Agent: documentation and Rego reference ↗

Azure AI Content Safety Prompt Shields scans text for jailbreak and indirect-injection patterns. Callable as an async gate at the write boundary; reject writes that score above the configured threshold. Maximum 10,000 characters per call; up to five documents per request.

Why choose it: Best as an additional classifier on top of schema and policy validation for ingestion pipelines where documents arrive from external or untrusted sources. The API is purpose-built for indirect-injection detection (the document-attack class), which schema and policy checks do not catch. Adds an HTTP round-trip (50-150 ms); apply to lower-throughput ingestion paths, not to high-frequency inter-agent message buses.

More details:

MCP write-side handler Implement a memory-write MCP tool whose handler runs schema, policy, and provenance gates before calling the underlying store. The inputSchema (JSON Schema) handles structural validation; the handler body enforces policy and provenance. Rejected writes return a structured isError response.

Why choose it: Best for deployments that already expose memory operations via MCP. The MCP specification (Security Considerations) requires servers to validate all tool inputs, so this pattern fulfils a specification obligation while implementing the three-gate stack. No additional infrastructure required if memory is already accessed via MCP tools.

More details:

MCP specification: server tools and inputSchema validation ↗

Dedicated validator agent A lightweight critic agent receives every candidate memory write, applies a Zod schema check and an LLM-based semantic plausibility check (Constitutional AI critic prompt), and either approves or rejects with a reason. Slower (100-500 ms per write) but catches semantically adversarial content that passes structural validation.

Why choose it: Best as a second-pass gate for writes claiming high-trust provenance, not for all writes. Schema and policy gates catch structural and permission violations; the semantic layer catches correctly-formatted but adversarial content (a policy document that asserts the wrong rule). Pair with the schema and policy options for structural gating; add this only where the cost of a missed semantic attack justifies the latency.

More details:

Anthropic: Constitutional AI, Harmlessness from AI Feedback ↗

Trade-offs

Schema validation (Zod / Pydantic) is sub-millisecond in-process; OPA policy decisions are under 1 ms with the Wasm build; embedding-distance checks add 5-20 ms. The managed classifier (Prompt Shields) adds 50-150 ms per call. Apply it only to lower-throughput ingestion, not to high-frequency agent buses.
The principal adoption cost is schema and policy definition for the full set of memory surfaces. A production agent typically has several distinct memory surfaces; missing one creates a usable poisoning vector.
A sustained high rejection rate from a single source is the signature of an active injection campaign. Monitor rejection rate per source as an operational metric and escalate when it rises without a corresponding legitimate ingestion event.

When NOT to use

Not appropriate as the sole trust mechanism when the write source is itself an agent with broad authority. Validating that content is well-formed does not validate that it is correct. In high-assurance contexts, combine validation with provenance attestation and human review for writes that claim to update authoritative policy or pricing.
Do not apply embedding-distance outlier detection to deliberately broad corpora (a general-knowledge RAG store). The cluster-centroid model assumes topic coherence; a general corpus produces constant false positives that desensitise the team. Scope this gate to narrowly-defined knowledge bases.

Limitations

Write-boundary validation rejects malformed and policy-violating writes but cannot detect semantically valid adversarial content. A correctly-formatted policy document that asserts the wrong rule passes all structural checks. OWASP MAS Guide T18 (RAG Input Manipulation Leading to Policy Bypass) is the canonical example.
A compromised validator provides false assurance that is worse than no validator. Teams stop looking for anomalies because they expect the gate to catch them. Treat validation policies as code: peer review, version control, signed releases.
Pair with cryptographic logging (Sigstore) for forensic rollback when a validator bypass is discovered, and with m-mem-anomaly for the reactive detection layer.

Maturity tier reasoning

Tier 2 because every component is individually mature: Pydantic v2 and Zod v3 are production schema-validation libraries; OPA is a CNCF Graduated policy engine; Sigstore is production-grade for signing.
Not Tier 1 because no industry-standard schema for memory-write metadata exists. Every vector store and framework defines its own attribution fields, and there is no published benchmark against which validation pipelines can be scored.
Not Tier 3 because the control composes production-ready components and requires no novel engineering. The gap is standardisation, not technology maturity.

Last verified against upstream docs: 2026-05-30.

PLACEMENT

On the canvas, this control can be placed on:

edge
node

Valid node kinds: shared-memory

Valid edge kinds: memory-write

Place it on the canvas →

MAESTRO LAYERS

ATLAS TECHNIQUES

AML.T0019 Publish Poisoned Datasets
Adversary publishes a manipulated dataset to a public hub (HuggingFace, Kaggle, GitHub) so that downstream training pipelines incorporate the poisoned data.
AML.T0020 Poison Training Data
Adversary modifies training data or its labels to embed exploitable behaviour into the resulting model, often only triggered by specific inputs at inference time.
AML.T0070 RAG Poisoning
Adversary injects malicious content into documents indexed by a retrieval-augmented generation system so future queries surface attacker-controlled context.
AML.T0080 AI Agent Context Poisoning
Adversary contaminates an agent's context store (short-term scratchpad, vector memory, conversation history) so future reasoning is biased toward attacker goals.

ATLAS MITIGATIONS

AML.M0007 Sanitize Training Data
Detect and remove or remediate poisoned training data before model training; quarantine outliers and adversarial-pattern matches.
AML.M0031 Memory Hardening
Trust boundaries and secure write paths around agent memory so attacker-controlled content cannot persist or be replayed as instruction.
AML.M0033 Input and Output Validation for AI Agent Components
Validate every input and output exchanged between agent components against schema and policy before it is acted on.

TRADE-OFFS

latency low
cost low
ux friction low
dev effort medium

PLAYBOOKS

OWASP v1.1 playbook that recommends this control:

P2 Preventing Memory Poisoning & AI Knowledge Corruption