EVIDENCE TRAIL
Behavioural divergence monitoring
Verbatim excerpts from the upstream sources cited on the mitigation page, with what each source does and does not prove. No upstream document uses the phrase "longitudinal drift from declared role" verbatim — this framing is Helmwart's normalisation of the OWASP T7 behavioural-consistency mandate and the NIST MEASURE 2.6 regular-evaluation requirement.
Last cross-checked against upstream sources: · 7 sources
References
Each entry shows what the source supports and what it does not prove.
OWASP Agentic AI — Threats & Mitigations v1.1
§T7 Misaligned & Deceptive Behaviors — Mitigation column
"Utilize deception detection strategies such as behavioral consistency analysis, truthfulness verification models, and adversarial red teaming to assess inconsistencies between AI outputs and expected reasoning pathways."
Supports: Names behavioural consistency analysis as the primary detection strategy for T7. The phrase "assess inconsistencies between AI outputs and expected reasoning pathways" is the upstream mandate for longitudinal comparison against a stable reference distribution.
Does not prove: Does not specify the statistical methods for comparison (KL-divergence, Wasserstein, etc.), the window length, or the cold-start bootstrap problem. "Consistency analysis" as written could mean per-session, not longitudinal.
OWASP Top 10 for Agentic Applications 2026
§ASI01 Agent Goal Hijack — Prevention and Mitigation Guidelines, item 7
"Maintain comprehensive logging and continuous monitoring of agent activity, establishing a behavioral baseline that includes goal state, tool-use patterns, and invariant properties (e.g., schema, access patterns). Track a stable identifier for the active goal where feasible, and alert on any deviations — such as unexpected goal changes, anomalous tool sequences, or shifts from the established baseline — so that unauthorized goal drift is immediately visible in operations."
Supports: Verbatim upstream mandate for a behavioral baseline covering tool-use patterns and alerting on shifts from it. This is the closest direct OWASP statement of the baseline-and-drift-alert architecture this control implements.
Does not prove: Framed in the context of goal hijacking (ASI01), not autonomous misalignment (T7/ASI10). The baseline described is goal-state-centric; the divergence monitor extends it to distribution-level signals (refusal rate, embedding centroid, output length) not named here.
OWASP Top 10 for Agentic Applications 2026
§ASI08 Cascading Failures — Prevention and Mitigation Guidelines, item 8
"Behavioral and governance drift detection: Track decisions vs baselines and alignment; flag gradual degradation."
Supports: The phrase "flag gradual degradation" is the clearest upstream acknowledgement that drift is a longitudinal signal, not a single-event trigger — directly supporting the longitudinal posture this control takes.
Does not prove: Item 8 is one clause in a nine-item list. The section focus is cascading failures across multi-agent systems, not single-agent drift. Does not name the baseline construction window or statistical thresholds.
NIST AI 600-1 — Generative AI Profile (NIST AI RMF)
MEASURE 2.6 — heading and first paragraph
"The AI system is evaluated regularly for safety risks — as identified in the MAP function. The AI system to be deployed is demonstrated to be safe, its residual negative risk does not exceed the risk tolerance, and it can fail safely, particularly if made to operate beyond its knowledge limits. Safety metrics reflect system reliability and robustness, real-time monitoring, and response times for AI system failures."
Supports: Establishes "evaluated regularly" and "real-time monitoring" as NIST RMF requirements for deployed GAI systems. The "fail safely … beyond its knowledge limits" clause is the framework rationale for detection-before-action that divergence monitoring provides.
Does not prove: MEASURE 2.6 actions (MS-2.6-001, 002) focus on content-harm and bias assessment, not agent behavioural-distribution monitoring. NIST AI 600-1 was written for GAI systems broadly, not agentic deployments specifically — it does not name tool-selection distributions or refusal-rate drift.
NIST AI 600-1 — Generative AI Profile (NIST AI RMF)
MANAGE 4.1 — subcategory heading and description
"Post-deployment AI system monitoring plans are implemented, including mechanisms for capturing and evaluating input from users and other relevant AI Actors, appeal and override, decommissioning, incident response, recovery, and change management."
Supports: Requires that post-deployment monitoring plans exist and be implemented — the organisational prerequisite for operating a divergence monitor. "Change management" in this context includes the re-baseline events this control mandates.
Does not prove: MANAGE 4.1 is an organisational governance requirement, not a technical specification for what to monitor or how to detect drift. Does not prescribe statistical methods, signal classes, or baseline window lengths.
MITRE ATLAS AML.M0024 — AI Telemetry Logging
AML.M0024 — full description field (ATLAS.yaml)
"Implement logging of inputs and outputs of deployed AI models. When deploying AI agents, implement logging of the intermediate steps of agentic actions and decisions, data access and tool use, installation commands, and identity of the agent. Monitoring logs can help to detect security threats and mitigate impacts. Additionally, having logging enabled can discourage adversaries who want to remain undetected from utilizing AI resources."
Supports: Verbatim mandate for logging agentic intermediate steps, tool use, and decisions — the raw telemetry layer that the divergence monitor consumes. "Monitoring logs can help to detect security threats" is the ATLAS rationale for why telemetry feeds a detection control.
Does not prove: AML.M0024 specifies what to log, not how to analyse the logs over time for distributional drift. It does not name statistical divergence methods, baseline windows, or the cold-start problem.
Gama et al. — "A Survey on Concept Drift Adaptation" (ACM CSUR 2014)
Abstract and §2 introduction — open-access preprint at eprints.bournemouth.ac.uk/22491/1/ACM%20computing%20surveys.pdf
"Concept drift primarily refers to an online supervised learning scenario when the relation between the input data and the target variable changes over time. … Adaptive learning refers to updating predictive models online during their operation to react to concept drifts."
Supports: Establishes the academic definition of concept drift — the foundational construct behind distribution-shift detection in ML pipelines. The Wasserstein, KL-divergence, and KS-test detectors cited in this control's implementation guidance all derive from the detection taxonomy surveyed here (§3 Taxonomy of Methods).
Does not prove: The survey covers supervised learning drift, not agentic behavioural drift. It does not address tool-selection distributions, refusal rates, or embedding-centroid drift specifically. The "relation between input and target" framing does not directly map to agentic signal classes without adaptation.