← Atlas · Mitigations Tier 2 · Real-composable

MITIGATION · m-divergence-monitor

Behavioural divergence monitoring — longitudinal drift from declared role

An agent's behaviour can shift gradually over time: tool-selection patterns change, refusal rates drop, output style drifts. No single interaction reveals it, and a single-shot evaluation cannot catch a trend that spans weeks. Behavioural divergence monitoring detects that drift by comparing per-window statistical distributions of observable agent signals against a declared baseline, and alerting when the gap exceeds a threshold.

Last reviewed 2026-05-12 · Status: published · Evidence →

At a glance

MATURITY

Tier 2

Available off-the-shelf or as a documented pattern, but newer or less broadly proven. Expect integration work and some operational nuance.

PLACES ON

node

Restricted to node kinds: agent

COVERAGE

1 threat

TRADE-OFFS

LAT

low

COST

low

DEV

medium

Latency · cost · UX friction · dev effort.

TL;DR

An agent's behaviour is a distribution, not a single output: how often it picks each tool, how frequently it refuses or escalates, how long its outputs are. Divergence monitoring tracks those distributions over rolling windows and alerts when the current window has shifted statistically from a declared baseline.
The monitor is useless before a stable baseline exists. A newly deployed agent has none. The first three to six weeks are a data-collection phase only; baseline-independent controls (goal-consistency monitoring, output moderation, human-in-the-loop review) must carry the detection load in that window.
Divergence monitoring operates per-window via statistical distribution; goal-consistency monitoring operates per-step via semantic similarity. Both are needed and cover different failure modes: one catches slow trends, the other catches single-step deviations.
Baselines must be re-declared as deliberate change-management events after each model update or major feature launch, not recalibrated reactively during an incident.

How it behaves

Telemetry window closes (e.g. end-of-day batch or rolling 24-hour window)

Compute statistical distance (Wasserstein / KS / KL-divergence) between current-window distribution and stable baseline for each signal class and task class

Log window metrics, no alert

Fire divergence alert; correlate with deployment events; escalate to on-call if unexplained

All computation runs out-of-band against persisted telemetry; no latency is added to the agent's action path.

What it is

An agent is a system that makes decisions repeatedly, across many tasks and sessions. Its behaviour is not a single output but a distribution: how often it selects each tool, how frequently it refuses or escalates, how long and stylistically consistent its outputs are. Under normal operation those distributions are stable. Goal drift, model updates, prompt injection campaigns, and subtle misalignment tend to move them.

Behavioural divergence monitoring treats that distribution as the observable signal. A baseline is established over a stable operating window, typically several weeks of production traffic, covering tool-selection mix, refusal and escalation rate, output-length distribution, and embedding-space centroid position for outputs. During operation, the monitor computes these same distributions over a rolling window and applies a statistical distance measure, such as Wasserstein distance, KS-test, or KL-divergence, to compare the current window against the baseline. When the distance exceeds a declared threshold, it fires an alert.

The mechanism is entirely out-of-band: all computation runs against persisted telemetry, so no latency is added to the agent's action path. Detection lag is the corresponding cost: a windowed monitor catches drift hours or days after onset, not in real time.

This control complements goal-consistency monitoring. Goal-consistency operates per-step via semantic similarity to the declared goal; divergence monitoring operates per-window via statistical distribution. Both are needed and they cover different failure modes: goal-consistency catches a single-step deviation; divergence monitoring catches a slow trend that any individual step would not reveal.

Baseline dependency. The monitor requires a stable baseline to compare against. A newly deployed agent has none. The first three to six weeks of deployment are a data-collection phase only, during which baseline-independent controls (goal-consistency monitoring, output moderation, human-in-the-loop review) must carry the detection load. Baselines should be re-declared as deliberate change-management events after each model update or major feature launch, not recalibrated reactively during an incident.

Detection signals

Tool-selection distribution shift per task class. A statistically significant move away from the baseline mix indicates the agent is reaching for different tools than it normally would on the same kind of work.
Refusal-rate change per task class. A sustained drop points to alignment regression; a sudden spike points to miscalibration or an upstream change in input distribution.

Threats it covers

T7 Misaligned and Deceptive Behaviors −1 severity step

WHY IT HELPS T7 Misaligned and Deceptive Behaviors describes an agent whose actions diverge from its declared purpose, often gradually across many sessions rather than in a single observable event. Longitudinal distribution monitoring is designed for exactly that failure shape: it accumulates behavioural signals across a window and alerts when the distribution has shifted statistically from the baseline, which single-step checks cannot do.

Principle coverage

Defence-in-Depth stage: Detect — and it advances:

Continuous Verification Continuous Verification requires that trust in an agent's behaviour be re-established at regular intervals rather than assumed from initial deployment. Divergence monitoring operationalises that requirement at the distribution layer: each telemetry window is a fresh check of whether the agent's observable behaviour still matches its declared role.
Defence-in-Depth Defence in Depth requires controls that fail independently. Divergence monitoring is the longitudinal statistical layer: it operates out-of-band against persisted telemetry, so it remains active even when per-step controls such as goal-consistency monitoring or output moderation have been bypassed or degraded.
Assume Breach Assume Breach treats a compromised or misaligned agent as an operational scenario to plan for. Divergence monitoring provides the detection path for that scenario when the compromise is slow and sub-threshold on any individual step: accumulating distribution data over weeks is the only way to make a gradual drift visible.
Robustness / Reliability Robustness requires that an agent behave consistently with its declared purpose across varying conditions and over time. Divergence monitoring enforces that requirement by making the agent's long-run behavioural distribution observable and alerting when it has shifted beyond what operational variation explains.

Design & governance principles (open design, economy of mechanism, accountability, …) are architectural, not advanced by a single placed control.

Implementation options

Five implementation paths covering different layers of the stack: managed LLM observability, OpenTelemetry-based instrumentation, ML observability platforms, self-build Prometheus metrics, and a custom statistical engine. Most teams compose two: one for trace collection and one for distribution-comparison alerting.

LangSmith Instrument agent runs with LangSmith tracing; run online evaluators continuously against production traces to score tool-selection patterns, refusal rates, and output quality; surface trends in the LangSmith monitoring dashboard.

Why choose it: Best when your agent stack already uses LangChain or LangGraph. LangSmith captures every tool call, input/output pair, and latency per run. Note: LangSmith does not ship a built-in statistical drift detector; you author the evaluator that computes the distributional comparison and emits a score.

More details:

LangSmith observability ↗

OpenTelemetry GenAI conventions Instrument agent workloads with the OTel GenAI semantic conventions: emit an invoke_agent or invoke_workflow span per run, record tool definitions, and export token-usage histograms to any OTLP-compatible backend.

Why choose it: Best when your observability stack is already OTel-native and you want a vendor-neutral signal schema. Note: all GenAI agent-span attributes are in Development status as of mid-2026; expect breaking changes between minor releases. The statistical comparison layer must be built on top of your existing metrics pipeline.

More details:

OTel GenAI agent span conventions ↗

Arize AX Send agent traces to Arize AX; configure continuous evaluations that run automatically against production traces; set alert thresholds on any computed metric.

Why choose it: Best when you want a managed LLM observability platform that handles trace ingestion, evaluation scheduling, and dashboard alerting without building the pipeline yourself. Arize AX supports hierarchical trace visualisation across multi-agent systems. The drift-detection layer is built on continuous evaluation scores, not a native statistical drift engine.

More details:

Prometheus + Grafana Instrument the agent process to emit Prometheus native histograms of tool-selection counts, refusal rate, and output-length distribution per task class; query drift in PromQL using histogram_quantile(), stddev_over_time(), and delta() against a baseline recording rule.

Why choose it: Best when your infrastructure already runs on Prometheus and Grafana and you want to reuse existing observability infrastructure at zero additional vendor cost. The statistical comparison runs entirely in PromQL at query time; no separate drift-detection service is required.

More details:

Self-build with scipy Persist per-agent, per-task-class signal distributions to a time-series store; run nightly or windowed comparison jobs using scipy.stats (wasserstein_distance, ks_2samp, entropy for KL-divergence) against the declared baseline window.

Why choose it: The only option that gives full control over signal definition, baseline window, statistical test selection, and alert semantics with no third-party observability dependency. Appropriate when the agent's action surface is non-standard or when no off-the-shelf platform covers the required signal schema.

More details:

Trade-offs

All computation runs out-of-band against persisted telemetry, so no latency is added to the agent action path. The adoption cost is detection lag: a windowed monitor catches drift hours or days after onset, not in real time.
Managed platforms (LangSmith, Arize AX) reduce instrumentation and pipeline work but introduce vendor dependency and require custom evaluator authoring for the distributional comparison layer. The Prometheus path reuses existing infrastructure at zero vendor cost but requires careful PromQL authoring.
Baseline curation is the dominant ongoing operational burden. A window too narrow causes alert storms on legitimate behaviour shifts; too wide and it masks genuine early drift. Budget quarterly re-baseline events as deliberate change-management actions.

When NOT to use

Do not enable alerting before the baseline window is stable. A newly deployed agent has no baseline and will produce unreliable alerts for the first several weeks.
Do not apply to agents with highly variable legitimate behaviour where task-mix and output-style variation across sessions is an expected feature of the workload, not a signal of drift.
For very low-traffic agents (fewer than roughly 50 decisions per day per task class), distributions are too sparse for statistical tests to be reliable.

Limitations

A sophisticated adversary who can observe the baseline distribution can craft outputs that remain within statistical bounds while still causing harm. Divergence monitoring is necessary but not sufficient.
Legitimate operational changes produce baseline drift that is statistically indistinguishable from adversarial drift without change-management records to correlate against.
Low-traffic agents accumulate baseline data slowly; a calendar-time window may cover too few samples for the chosen statistical test to produce reliable results.
OTel GenAI agent-span attributes are in Development status as of mid-2026; schema changes between minor releases are possible.

Maturity tier reasoning

Tier 2 fits because the component primitives are all production-available with documented APIs and active communities. No novel engineering is required to assemble the control.
What keeps the agentic application at Tier 2 rather than Tier 1 is the absence of an industry-standard schema for agentic behavioural baselines. The signal definitions, baseline window conventions, and threshold calibration practices are deployment-specific with no canonical guidance.
Managed LLM observability platforms reduce the instrumentation burden but still require custom evaluator authoring for the distributional comparison layer, which is where most of the implementation work lives.

Last verified against upstream docs: 2026-05-30.

PLACEMENT

On the canvas, this control can be placed on:

node

Valid node kinds: agent

Place it on the canvas →

MAESTRO LAYERS

L3 L5

ATLAS TECHNIQUES

AML.T0067 LLM Trusted Output Components Manipulation
Adversary manipulates the structured parts of an LLM response (citations, tool-call arguments, approved-action markup) that downstream systems treat as trusted.
AML.T0080 AI Agent Context Poisoning
Adversary contaminates an agent's context store (short-term scratchpad, vector memory, conversation history) so future reasoning is biased toward attacker goals.

ATLAS MITIGATIONS

AML.M0024 AI Telemetry Logging
Log inputs, outputs, and reasoning steps of deployed AI models so anomalous behaviour can be detected and incidents reconstructed.
AML.M0022 Generative AI Model Alignment
Train or fine-tune the model so its outputs align with intended behaviour; reduces the residual surface of jailbreak / misalignment attacks.

TRADE-OFFS

latency low
cost low
ux friction low
dev effort medium

PLAYBOOKS

5 OWASP v1.1 playbooks recommend this control: