T49: Semantic Drift in Expense Policy Embeddings

Definition

Policy updates are not reflected in the vector database embeddings used by the Retrieval-Augmented Generation (RAG) pipeline; the RPA agent retrieves and applies stale policy at inference time. The threat is distinct from T1 Memory Poisoning: it arises from operational neglect rather than adversarial injection, and it targets the external knowledge base rather than the agent’s internal memory. Originally numbered T17 in the OWASP MAS Threat Modelling Guide v1.0 (April 2025); renumbered T49 in the Helmwart catalogue to avoid collision with OWASP Agentic AI v1.1 T17.

What it looks like in practice

The company updates its expense policy to disallow alcohol purchases from any category of claim. The policy document is updated in the source repository, but the re-embedding pipeline that refreshes the vector store has no automated trigger on document change. It runs on a monthly schedule. For the following three weeks, the RPA agent queries the vector store for expense policy and retrieves embeddings of the old document, which permits alcohol expenses under the “business entertainment” category. Claims including alcohol are approved as compliant.

The discrepancy surfaces only when a compliance auditor manually reviews a sample of approved claims and finds alcohol expenses that post-date the policy update.

Why it’s dangerous in multi-agent context

RAG-augmented agents retrieve policy at inference time and act on what they find. Unlike a rule-based system where policy updates require an explicit deployment, a RAG agent’s knowledge of current policy is entirely a function of the vector store’s contents. Policy drift is operationally silent: the agent continues to function normally, approving and rejecting claims, but against stale policy and without any error signal. In a multi-agent pipeline where a policy interpretation agent feeds conclusions to a downstream approval agent, the stale policy propagates through both agents before any audit detects the discrepancy. T18 (RAG Input Manipulation Leading to Policy Bypass) is the adversarial complement: where T49 is passive drift, T18 is an attacker actively exploiting the retrieval layer.

Detection signals

Stale embeddings are invisible to the agent itself; detection requires comparing the vector store’s current document hashes against the canonical policy repository at the time each approval decision is made.

The embedding’s stored source-document hash at retrieval time not matching the SHA-256 hash of the current policy file in the source repository. A freshness check on every retrieval that compares these two values will catch drift the moment it occurs.
An approved claim containing a line item from a category that was removed in a policy update with a commit timestamp earlier than the approval timestamp. Cross-referencing approval logs against the policy change log surfaces approvals that cannot be valid under current policy.
The vector store’s last_embedded_at timestamp for the policy document being older than the source repository’s last_modified_at for the same document. Expose this gap as a metric and alert when it exceeds the defined staleness SLA.
A rising approval rate for line-item categories that were restricted or capped in a recent policy update. A category-level approval rate anomaly alert, calibrated against pre-update baselines, is a behavioural indicator of stale policy being applied.
A re-embedding pipeline run not completing within the expected interval after a policy file commit event. Monitor the embedding pipeline’s completion event log and alert if the expected re-embedded event does not appear within N minutes of a policy repository commit.

Mitigations

Attach embedding version metadata and a hash of the source document to each embedding; verify at retrieval time that the retrieved embedding’s source hash matches the current policy document hash.
Define and enforce a maximum staleness SLA for policy embeddings: trigger an automatic re-embedding run whenever a policy source document is modified, not on a fixed schedule.
Implement a freshness check in the retrieval pipeline: if the retrieved embedding’s timestamp exceeds the staleness threshold, fall back to a hard-coded policy rule rather than acting on the stale embedding.
Log the embedding version and source document hash alongside each approval decision so that auditors can determine which policy version governed each claim.

Relation to base threat (T1–T17)

T49 extends T1 Memory Poisoning. Where T1 addresses adversarial injection into the agent’s memory or retrieval store, T49 addresses the passive drift variant: the vector store diverges from ground truth through operational inaction rather than attack. T18 (RAG Input Manipulation Leading to Policy Bypass) is the active counterpart: an attacker who exploits the same retrieval surface that T49 has left vulnerable through stale embeddings.

OWASP Top 10 for Agentic Applications 2026

The Agentic Top 10 (ASI01 through ASI10) is a separate practitioner-facing publication that maps onto the master Threats & Mitigations threat numbering. T49 is covered by the following Top 10 entries:

ASI06 Memory & Context Poisoning primary

An adversary writes malicious or misleading data into an agent's persistent memory or shared vector store, so that every future session, and every peer agent reading from the same store, operates on corrupted context. The defining difference from single-turn injection (ASI01) is that the poisoned data survives session reset; the agent's reasoning drifts without any new attacker input.

OWASP LLM Top 10: LLM01:2025 LLM04:2025 LLM08:2025

Source: OWASP Top 10 for Agentic Applications 2026 (Dec 2025) · the Top 10 is a compass into the master Threats & Mitigations taxonomy, not a replacement for it.

Design principles at stake

When T49 is present, these security design principles are the ones being violated or tested. Each links to the full principle; the mitigations below are how you restore them.

Defence-in-Depth Semantic drift is operationally silent: the agent continues approving and rejecting claims normally, producing no error signal, while acting against a policy version that may be weeks out of date. Depth means the retrieval pipeline cannot rely on the vector store alone: a content hash of the source policy document attached to each embedding lets the retrieval step verify at inference time that what it retrieved still matches the current document, and a maximum-staleness SLA with an automatic re-embedding trigger on document change ensures the store never drifts far from ground truth in the first place. A freshness fallback (substituting a hard-coded rule if the retrieved embedding's timestamp exceeds the staleness threshold) provides a third layer so that even a delayed re-embedding cannot produce an unapproved approval.
Memory & RAG Integrity The vector store is the agent's only knowledge of current policy; when its embeddings drift from the source document through operational neglect, every retrieval returns a stale fact that the agent treats as authoritative: the same trust model that adversarial poisoning exploits in T18, here triggered by inaction rather than attack. Memory integrity applied to the RAG pipeline means each embedding carries both a content hash and the source document hash, verified on every retrieval so that a stale or tampered embedding is detected before it governs a decision. Logging the embedding version and source document hash alongside each approval decision gives auditors the evidence to determine which policy version was applied to each claim, closing the retrospective accountability gap that lets drift accumulate undetected.

Recommended mitigations

Auto-generated from the mitigation catalog: every mitigation whose coverage map includes T49, sorted by maturity tier (Tier 1 production-canonical first, then Tier 2, then Tier 3 research-stage).

Tier 2 Mem validate (Memory content validation — a write-boundary gate on what enters the agent's memory store)

An agent's memory store is a persistent surface: anything written to it can be retrieved by any agent, in any session, for the lifetime of the corpus. Memory poisoning exploits that persistence by writing adversarial content that steers the agent's reasoning long after the attacker has gone. Write-boundary validation prevents this by running every candidate memory write through schema, policy, and provenance checks before it is committed. Content that fails any gate is rejected and never reaches the store.

why it helps Semantic drift is the gradual corruption of a corpus through many small writes, each individually plausible. Embedding-distance outlier checks slow this process by flagging candidate writes whose embeddings are anomalously distant from the trusted cluster centroid, routing them for human review rather than committing them. The control does not eliminate slow-drift poisoning that stays within statistical thresholds; it raises the cost and slows the rate.
Tier 2 Shared-memory ACL (Shared-memory ACL — per-agent, per-namespace read/write access control on shared vector stores)

When multiple agents share a single vector store, the access boundaries between them are not enforced by the store itself unless you configure them explicitly. Without per-namespace write and retrieval controls, an agent that can write to the shared corpus can insert crafted vectors into any namespace it can reach, and any agent that can query the store can retrieve another agent's confidential documents through embedding-space proximity. Shared-memory ACL addresses this by tagging every vector with a principal identifier at write time and filtering every retrieval query to the requesting agent's namespace, enforced at the gateway layer where the agent cannot bypass it.

why it helps Namespace isolation contains semantic drift within the affected agent's partition. An agent whose memory corpus has drifted cannot propagate that drift into the namespaces of agents operating in separate partitions.
Tier 2 Vector ACL (Permission-aware vector retrieval — ACLs at the retrieval boundary)

A vector store returns results by embedding-space proximity, not by who is asking. Without a per-principal filter applied before similarity ranking, a query from tenant A can surface tenant B's vectors if the embeddings are close enough. Vector ACL closes that gap: every retrieval call is scoped to the requesting principal's namespace or payload partition before the store ranks any results, so cross-principal hits are structurally impossible rather than merely unlikely.

why it helps T49 involves a semantically drifted or poisoned corpus spreading its influence across retrieval results for multiple principals. Namespace isolation contains that drift within the affected namespace: a drifted corpus in one namespace cannot propagate to other principals' retrieval results through embedding-space proximity.

Multi-agent variants: OWASP MAS Guide

The OWASP OWASP MAS Threat Modelling Guide v1.0 catalogues 1 named multi-agent variant of T49, anchored to specific MAESTRO layers. Each is a concrete attack pattern that emerges when this threat compounds across agents.

CL RAG Manipulation / Semantic Drift / Repudiation Cascade extends T18, T49, T8

Adversary poisons a shared RAG store (T18); the injected context gradually shifts agent reasoning over time (T49); because logs are sparse or selectively pruned, the drift cannot be attributed after the fact (T8). Cross-layer: L2 data store, L3 agent reasoning, L5 observability.

Source: OWASP MAS Threat Modelling Guide v1.0, §2 Overview of MAESTRO Framework — Extended Threat Scenarios + Cross-Layer table.

Red-team pivot: MITRE ATLAS techniques

MITRE ATLAS catalogues adversary techniques against AI systems. Where this OWASP threat has an attacker-perspective counterpart, the ATLAS technique is shown below. That is what a red team would actually be doing on the wire. Use this for detection-signal anchoring, threat-hunting hypotheses, and IR runbooks. Source: mitre-atlas/atlas-data v5.6.0.

AML.T0070 RAG Poisoning view on ATLAS ↗

Adversary injects malicious content into documents indexed by a retrieval-augmented generation system so future queries surface attacker-controlled context.

AML.T0020 Poison Training Data view on ATLAS ↗

Adversary modifies training data or its labels to embed exploitable behaviour into the resulting model, often only triggered by specific inputs at inference time.

AML.T0080 AI Agent Context Poisoning view on ATLAS ↗

Adversary contaminates an agent's context store (short-term scratchpad, vector memory, conversation history) so future reasoning is biased toward attacker goals.

Agentic angle: Persistent across sessions: a single successful poisoning influences every later decision until the memory is purged.

References

OWASP MAS Threat Modelling Guide v1.0 (April 2025) §3 RPA Expense Reimbursement Agent — Layer 2 Data Operations. Originally published as T17 in that guide; renumbered T49 in the Helmwart catalogue to preserve alignment with OWASP Agentic AI v1.1 IDs.

Sources

OWASP-MAS-Guide ↗ · 1.0 (Apr 2025) · §3 RPA Expense Reimbursement Agent — Layer 2 Data Operations