AGENTIC FACTOR

Non-Determinism

Non-Determinism is the property that the same input does not necessarily produce the same output. In conventional software, identical inputs yield identical state transitions; in agentic systems, sampling, planning order, retrieved context, and multi-agent timing all introduce variation.

Last reviewed 2026-05-08 · Status: published · 10 threats driven by this factor

At a glance

FACTOR

Non-Determinism

One of the four agentic factors that drive threat severity.

THREATS DRIVEN

T1 · T2 · T5 · T6 · T7 · T8 · T26 · T33 · T41 · T48

SOURCE

MAESTRO

1.0 (Apr 2025) · Executive Summary — Agentic Factors Emphasis

Why it matters for security: many security controls assume deterministic behaviour. Test coverage that exercises a code path once is enough to reason about it; with an agent, the same path may be taken in many shapes. Guardrails that hold during evaluation may drift between evaluations. Repudiation is harder because you cannot replay an action and get the same result.

Non-determinism interacts with the other agentic factors. It compounds Autonomy because more decisions are taken without human involvement, and each decision may take a different shape. It compounds Agent-to-Agent Communication because the pattern of inter-agent messages becomes itself non-deterministic.

A concrete scenario

A financial services firm runs an automated trade-reporting agent that reads positions from a database, drafts regulatory reports, and submits them via API to a trading venue. During internal testing, the agent produces correct outputs on every trial run. In production, three weeks later, a shift in retrieved context (a slightly different ordering of positions returned from the database, combined with temperature-induced sampling variation) causes the agent to omit a large equity position from one report. The omission is not caught by the deterministic unit tests, which replayed a fixed context. The venue’s automated system flags the inconsistency two days later. The firm cannot reproduce the failure in a test environment because the exact token sequence, database snapshot, and random seed that caused the omission no longer exists.

What this means for your system

Test coverage is necessary but not sufficient. A conventional code path exercised once in CI is reliable across deploys; an agent path exercised once tells you it can behave correctly, not that it will. Your evaluation suite needs enough repeated runs of each scenario to give you a distributional picture, not a binary pass/fail.

Repudiation and forensics become materially harder. When an incident occurs, you cannot replay the agent’s execution from inputs alone. You also need the exact model checkpoint, the sampled tokens, the retrieved documents in retrieval order, and the timing of any concurrent agents. Without deterministic replay, root-cause analysis depends on the logs you happened to capture at the time.

Guardrails calibrated on evaluation data can drift silently. A content filter or output validator that blocks 99% of harmful outputs in testing may perform worse or differently on the distribution of inputs seen in production, where context is longer, retrieval is live, and users push boundaries in ways your red team did not.

What to do about it

Set temperature to zero (or the lowest non-zero setting your model provider supports) for any agent task whose output feeds a compliance, financial, or safety-critical downstream system. Non-zero temperature is the simplest source of non-determinism you can eliminate.

Log the full context window at the point of each consequential decision, not just the final output. Include the retrieved documents, tool outputs, and intermediate reasoning steps. This is the minimum needed for post-hoc reconstruction. Structured logging to an append-only store (e.g. Cloudflare R2, AWS S3 with Object Lock) is a practical baseline.

Build property-based evaluation, not just example-based evaluation. For each important agent behaviour, define an invariant (“the report always includes every position above £1,000”) and run that check across hundreds of sampled inputs, not a fixed regression suite.

Use model pinning (specific model version, not a floating alias) in production so that a provider-side model update does not silently change output distributions between your last evaluation and today’s deployment.

Treat output validation as a runtime control, not a testing artefact. A schema check, a numeric range assertion, or a classifier applied to every agent output before it reaches a downstream system catches distribution drift that pre-deployment evaluation cannot.

ASI entries this factor most amplifies:

ASI06 — Memory & Context Poisoning: poisoned context interacts with non-deterministic reasoning to produce variable and unpredictable harmful outputs, making detection harder than with deterministic systems.
ASI08 — Cascading Failures: when inter-agent message patterns are themselves non-deterministic, failure modes propagate in ways that are hard to reproduce or anticipate in staging environments.
ASI01 — Agent Goal Hijack: non-deterministic goal selection means an injected goal may succeed on some runs and fail on others, complicating detection via anomaly monitoring.

Example threats driven by this factor:

T1 — Memory Poisoning: non-deterministic retrieval from a vector store means a poisoned entry surfaces unpredictably; you cannot tell from logs whether a given decision was made against clean or tainted context.
T7 — Misaligned and Deceptive Behaviours: deceptive outputs appear on some sampling paths and not others, making them hard to catch in evaluation and hard to prove in incident review.
T8 — Repudiation and Untraceability: non-determinism is the root of the repudiation problem: the same inputs do not produce the same outputs, so the agent can plausibly claim any given output was a model variation rather than intentional action.

Threats driven by this factor

Every threat in the catalog whose agenticFactors list includes Non-Determinism.

At a glance

A concrete scenario

What this means for your system

What to do about it

Related

Threats driven by this factor

Upstream sources