← Atlas · Mitigations Tier 2 · Real-composable

MITIGATION · m-mem-anomaly

Memory anomaly detection — runtime detection of poisoning that slipped past validation

An agent's memory store can receive adversarial content that passes schema and policy validation because the content is structurally valid but statistically unusual. Memory anomaly detection addresses this by monitoring write rates, embedding distances, provenance tags, and retrieval patterns at runtime, and quarantining writes whose statistical signatures diverge from the established baseline.

Last reviewed 2026-05-12 · Status: published · Evidence →

At a glance

MATURITY

Tier 2

Available off-the-shelf or as a documented pattern, but newer or less broadly proven. Expect integration work and some operational nuance.

PLACES ON

node

Restricted to node kinds: shared-memory

COVERAGE

1 threat

TRADE-OFFS

LAT

low

COST

low

DEV

medium

Latency · cost · UX friction · dev effort.

TL;DR

Reactive complement to write-time validation: catches poisoning that passed schema and policy checks by detecting statistical anomalies in write rates, embedding distances, provenance tags, and retrieval patterns at runtime.
Fires after a write commits; the first response is automated quarantine rather than deletion: suspect writes are moved to an isolated partition so the full blast radius can be determined before any rollback begins.
Four signal classes: write-rate spikes per source against a rolling baseline, embedding-distance outliers from per-topic cluster centroids, provenance mismatches between claimed source and actual ingestion path, and retrieval-distribution shifts on previously stable queries.
Cold-start limitation: no established baseline means elevated false-positive rates for approximately the first 30 days. Enable only after sufficient normal write-pattern data has accumulated.

How it behaves

A write commits to agent memory (vector store, KV store, conversation history)

Evaluate write rate per source against rolling baseline -> check embedding distance from per-topic centroid -> verify provenance tag matches ingestion path -> monitor recall-pattern stability

Write remains active; no alert raised

Quarantine suspect writes to isolated partition; alert security team, open incident, do not delete

Detective phase: quarantine first, scope blast radius, then roll back. Do not attempt rollback until the full set of affected sessions is known.

What it is

An agent's memory store accumulates content over time, and that content shapes every future retrieval. A write-time validation gate checks whether incoming content conforms to a declared schema and passes policy rules. What it cannot reliably catch is content that is structurally valid but adversarially crafted: a correctly formatted document that asserts a false fact, or a sequence of legitimate-looking writes that collectively shift retrieval results toward attacker-controlled content. Memory anomaly detection is the reactive layer that addresses this gap.

Rather than inspecting the content of individual writes, anomaly detection monitors the statistical behaviour of the write stream and the retrieval patterns it produces. Four classes of signal matter:

Write-rate anomalies. A poisoning campaign must typically write repeatedly to move retrieval results; that volume produces a detectable spike in write rate from a single source against a rolling baseline window.
Embedding-distance anomalies. Content that does not belong to the established corpus tends to produce embeddings that fall far from the per-topic cluster centroid. Distance from the centroid, measured as a standard-deviation multiple of the baseline distribution, is a proxy for content that is anomalous relative to the existing knowledge domain.
Provenance anomalies. A write that claims a particular source identity but arrives through a different ingestion path produces a mismatch between the claimed origin and the actual delivery channel. That mismatch is itself a signal independent of content.
Recall-pattern anomalies. A sudden change in which vectors are retrieved for previously stable queries indicates that newly committed content is displacing established results, which is the downstream effect of a successful poisoning write.

Detection is reactive: by the time an alert fires, the suspect content has already been committed. The correct first response is quarantine, not deletion. Moving the affected writes to an isolated partition preserves them for forensic analysis and allows the full set of affected sessions to be identified before any rollback attempt begins. Pair this control with a write-time validation gate that intercepts what is detectable before commit, and with provenance tracking that makes the incident response chain legible.

Detection signals

Write rate per source versus rolling baseline. A sustained spike from a single source is the characteristic signature of a poisoning campaign, which typically requires repeated writes to shift retrieval behaviour.
Embedding distance from per-topic cluster centroid. A new write whose embedding falls far outside the established cluster for a topic indicates content that does not belong to the normal corpus, whether from drift or adversarial injection.

Threats it covers

T1 Memory Poisoning −1 severity step

WHY IT HELPS Memory Poisoning is the injection of adversarial content into an agent's memory store so that it influences future retrievals and, through them, the agent's reasoning and output. This control detects poisoning attempts by monitoring for the statistical signatures they produce: abnormal write rates from a single source, embeddings that fall far outside the established cluster for a topic, provenance tags that do not match the actual ingestion path, and retrieval distributions that shift away from previously stable results.

Principle coverage

Defence-in-Depth stage: Detect — and it advances:

Memory & RAG Integrity Memory integrity requires that content retrieved from the store be what legitimate principals wrote. Memory anomaly detection advances that principle at the reactive layer: it monitors the statistical behaviour of the write stream for the signatures that adversarial campaigns produce, and quarantines suspect content before it can influence agent reasoning, serving as the detective counterpart to the write-time validation gate.

Design & governance principles (open design, economy of mechanism, accountability, …) are architectural, not advanced by a single placed control.

Implementation options

Five verified implementation options covering embedding-space anomaly detection (two self-build approaches with no vendor dependency), write-rate monitoring via vector-store-native metrics, and application-layer recall-quality canaries. Options 1 and 2 are low-cost, dependency-free, and address the embedding-distance signal class directly. Options 3 and 4 reuse the observability stack already in place. Option 5 adds the recall-quality signal class. All five are composable.

scikit-learn IsolationForest Train an IsolationForest on historical embeddings from known-good writes. Score new writes at inference time; the model returns +1 (inlier) or -1 (outlier). Suitable for high-dimensional embedding spaces without distribution assumptions.

Why choose it: Best for embedding-distance detection when the topic space is broad or cluster structure is not well-defined. IsolationForest makes no normality assumption and handles high-dimensional spaces well. Zero additional dependencies beyond scikit-learn. API: clf.fit(X_train) -> clf.predict(X_new) -> clf.score_samples(X_new).

More details:

scikit-learn: Outlier detection with IsolationForest ↗

numpy centroid-distance threshold Compute per-topic cluster centroids nightly from known-good embeddings. On each write, compute distance = np.linalg.norm(embedding - centroid) and alert when distance exceeds mean + 3 standard deviations of the baseline distribution.

Why choose it: Best when the memory corpus is topic-structured with coherent clusters. Zero additional dependencies; runs inline on the write path at approximately 5 ms per vector. The 3-sigma threshold is the canonical anomaly-detection boundary from NIST AI 600-1 MEASURE-2.7; adjust per corpus based on observed baseline variance. Pairs naturally with option 1 for broader detection.

More details:

numpy.linalg.norm: distance computation ↗

Weaviate Prometheus metrics Set PROMETHEUS_MONITORING_ENABLED=true; Weaviate serves metrics at :2112/metrics. Exposes vector index insert/delete counts, batch write durations, and LSM bucket write operations with per-shard granularity. Alert on rate(weaviate_vector_index_operations_total[5m]) spikes per source tag.

Why choose it: Best for write-rate monitoring in Weaviate deployments, reusing Prometheus/Grafana alerting infrastructure already in place. Anomaly detection logic lives in alerting rules, not in Weaviate itself. Does not provide embedding-distance detection; compose with options 1 or 2 for that signal class.

More details:

Weaviate monitoring: Prometheus metrics configuration ↗

Pinecone + Datadog Pinecone exposes audit logs tracking user and API actions, and a Datadog integration for shipping vector search and RAG metrics to an external monitoring platform. Write-rate anomaly alerting is implemented in Datadog using standard metric monitors.

Why choose it: Best for Pinecone deployments already shipping metrics to Datadog. The audit log captures who wrote what and when; Datadog's anomaly monitor type applies statistical baseline detection to the write-rate time series without requiring a separate alerting system. Pinecone does not perform anomaly scoring itself; the detection logic lives in Datadog.

More details:

Pinecone docs index: audit logs and integrations ↗

LangSmith online evaluation LangSmith online evaluators run LLM-as-a-judge scoring on production traces at configurable sampling rates. The Feedback Score alert type fires when the average score for a project drops below a threshold, usable as a recall-quality proxy for poisoning-driven displacement of retrieval results.

Why choose it: Best for the recall-pattern signal class in LangChain-instrumented deployments. A sustained drop in retrieval relevance for a previously stable query set indicates that newly written content is displacing established results. Does not address write-rate or embedding-distance detection; treat as a canary layer on top of options 1 through 4.

More details:

Trade-offs

Statistics computation is asynchronous to the write path; embedding-distance checks add 5 to 20 ms inline if used as a write-time gate, or run out-of-band as alert-only with no write-path latency impact.
The principal operational cost is baseline tuning, not infrastructure spend. A threshold set too sensitively produces alert fatigue that desensitises the team; a threshold set too loosely misses gradual poisoning. NIST AI 600-1 MEASURE-2.7 explicitly notes that anomaly thresholds require continuous calibration.
Cold-start systems with no established baseline produce elevated false-positive rates for approximately the first 30 days. Enable write-rate and embedding-distance alerting only after a sufficient baseline window is available.

When NOT to use

Counter-productive as a write-time gate on cold-start systems: false-positive rates will be high enough to block legitimate writes. Delay enabling the control until at least 30 days of normal write-pattern data are available.
Wrong tool for catching a single well-crafted adversarial document. The control is tuned for statistical volume signals, not per-document semantic analysis. Do not use as the only defence against T1; deploy m-mem-validation as the pre-commit gate first.

Limitations

Anomaly detection assumes a stable baseline. Cold-start systems and rapidly growing corpora produce false positives for legitimate growth spikes that resemble write-rate attacks.
A slow-drift poisoning campaign that spreads writes over weeks stays within rolling-window thresholds. A patient attacker who maintains write rates below the alert threshold defeats the control. Supplement with periodic full-corpus integrity audits.
Detective phase only: by the time an alert fires, the poisoned content has been committed. Combine with rollback procedures and the pre-commit write-boundary validation provided by m-mem-validation.

Maturity tier reasoning

Tier 2 because the underlying observability infrastructure (Prometheus, CloudWatch, Datadog) is Tier 1 and embedding-distance anomaly detection is documented in peer-reviewed literature and deployed in production research-grade systems.
What holds the composed pattern at Tier 2 is the baseline-tuning gap: no industry-standard default threshold library exists for agent-memory anomaly detection, so every deployment calibrates from scratch. This is expected to improve as vector store vendors ship default anomaly profiles.

Last verified against upstream docs: 2026-05-30.

PLACEMENT

On the canvas, this control can be placed on:

node

Valid node kinds: shared-memory

Place it on the canvas →

MAESTRO LAYERS

L2 L5

ATLAS TECHNIQUES

AML.T0070 RAG Poisoning
Adversary injects malicious content into documents indexed by a retrieval-augmented generation system so future queries surface attacker-controlled context.
AML.T0080 AI Agent Context Poisoning
Adversary contaminates an agent's context store (short-term scratchpad, vector memory, conversation history) so future reasoning is biased toward attacker goals.

ATLAS MITIGATIONS

AML.M0024 AI Telemetry Logging
Log inputs, outputs, and reasoning steps of deployed AI models so anomalous behaviour can be detected and incidents reconstructed.
AML.M0031 Memory Hardening
Trust boundaries and secure write paths around agent memory so attacker-controlled content cannot persist or be replayed as instruction.

TRADE-OFFS

latency low
cost low
ux friction low
dev effort medium

PLAYBOOKS

OWASP v1.1 playbook that recommends this control:

P2 Preventing Memory Poisoning & AI Knowledge Corruption