T21: Inconsistent Workflow State

Definition

When agents hold inconsistent views of shared workflow state (divergent copies of shared memory or Directed Acyclic Graph (DAG) node objects), they act on contradictory information simultaneously. The result is conflicting actions, stale approvals, or silent denial of service for legitimate claim processing.

What it looks like in practice

In an RPA expense reimbursement pipeline, a validation agent writes an “approved” flag to a shared state object after completing its check. Due to a state synchronisation delay, the routing agent reads the object before the flag is committed and finds it in the default “pending” state. The routing agent silently drops the claim rather than forwarding it for payment. The claim disappears from the queue without an error event; it is not rejected, but it is also never paid.

In the ElizaOS context, insufficient isolation between agent actions (T31) allows a state write from one agent to corrupt the shared object that a peer agent is actively reading, producing conflicting tool call outcomes from a single logical operation.

Why it’s dangerous in multi-agent context

Each agent in a multi-agent workflow maintains a local view of shared state. Without state-consistency guarantees such as versioned objects, optimistic locking, or distributed transaction semantics, an agent may commit an irreversible action (approve, route, pay) that a peer agent would have blocked had it been consulted. Unlike a single-threaded system, there is no natural serialisation point. Autonomous action on a stale view produces outcomes that are neither logged as errors nor visible as anomalies; the system appears to be operating normally while claims are silently dropped or incorrectly processed.

Detection signals

State-inconsistency failures are silent by default. The system emits no error, so detection requires purpose-built completeness checks on the workflow event stream.

A claim ID that is present in the validated state partition of the state store but has no corresponding routed event in the workflow event log within the defined SLA window (e.g. 30 seconds after validated timestamp): wire a missing-transition alert to the event log consumer.
A routed event for a claim ID where the state object’s validation_status field is pending rather than approved or rejected. This read-time assertion fires when the routing agent acts on an uncommitted validation result.
A non-monotonic version sequence on a shared state object: a write whose version field is lower than the current committed version in the store. This indicates a stale-read followed by a clobbering write, an optimistic-locking violation.
Two agents emitting conflicting tool call outcomes for the same logical operation within the same 100 ms window. Correlate tool-call logs by claim_id and flag any pair where one agent records approved and the other records dropped for the same ID.
A sustained rise in the ratio of claims reaching created status versus claims reaching paid status, without a corresponding rise in rejection events. This throughput-divergence metric indicates silent drops accumulating in the queue.

Mitigations

Version all shared state objects and require agents to read the current version before writing; reject writes that conflict with the latest committed version (optimistic locking).
Enforce a “completed” state transition as an atomic operation; agents downstream of a step must not read state until the upstream agent has committed a terminal status.
Emit a step-level state-change event on every transition; alert on missing transitions (a claim in “validated” status for longer than a defined threshold without a “routed” event following).
Use a distributed lock or a saga pattern for multi-agent workflows where actions span multiple agents and must be rolled back atomically on failure.

Relation to base threat (T1–T17)

T21 extends T2 Tool Misuse. Where T2 addresses a single agent misusing a tool, T21 addresses the multi-agent coordination layer: state inconsistency between agents produces the same class of incorrect tool invocation without any individual agent behaving maliciously. T19 (Unintended Workflow Execution) is the companion threat where the wrong step executes because the workflow graph is miswired rather than because state synchronisation failed.

OWASP Top 10 for Agentic Applications 2026

The Agentic Top 10 (ASI01 through ASI10) is a separate practitioner-facing publication that maps onto the master Threats & Mitigations threat numbering. T21 is covered by the following Top 10 entries:

ASI08 Cascading Failures primary

A single low-severity fault (a hallucinated value, a corrupted tool output, a poisoned memory entry) propagates across a network of agents that each build on the last agent's output, compounding into system-wide harm that is disproportionate to the original defect. ASI08 is about propagation and amplification, not the fault's origin; the initial trigger may itself be innocuous.

OWASP LLM Top 10: LLM01:2025 LLM04:2025 LLM06:2025

Source: OWASP Top 10 for Agentic Applications 2026 (Dec 2025) · the Top 10 is a compass into the master Threats & Mitigations taxonomy, not a replacement for it.

Design principles at stake

When T21 is present, these security design principles are the ones being violated or tested. Each links to the full principle; the mitigations below are how you restore them.

Defence-in-Depth State inconsistency is invisible without purpose-built tooling: an agent acting on a stale "pending" view silently drops a legitimate claim with no error event, so the system appears normal while workflow correctness silently fails. Depth means no single agent's local state view is trusted for irreversible actions: optimistic locking requires agents to read and confirm the current committed version before writing, the "completed" state transition is enforced as an atomic operation so downstream agents cannot read partial state, and step-level state-change events with timeout alerts surface missing transitions before claims are silently lost. A saga pattern with compensating transactions provides the recovery path when state inconsistency has already produced a conflicting commit.

Recommended mitigations

Auto-generated from the mitigation catalog: every mitigation whose coverage map includes T21, sorted by maturity tier (Tier 1 production-canonical first, then Tier 2, then Tier 3 research-stage).

Tier 3 Workflow state consistency (Workflow state consistency — distributed-state integrity checks for multi-agent workflows)

When multiple agents read and write shared workflow state concurrently, a network partition, a delayed message, or an adversarially timed race condition can produce divergent views. An agent acting on stale or conflicting state may authorise an action it would reject given correct current state. Hash-chained state snapshots, merge-point conflict detection, and optimistic concurrency control close that window.

why it helps Inconsistent Workflow State is the named threat: concurrent agents produce contradictory state, and the inconsistency window becomes an authorisation bypass surface. Hash-chained snapshots make divergence detectable; merge-point conflict detection halts the workflow the moment a mismatch is found rather than allowing either branch to proceed.

Red-team pivot: MITRE ATLAS techniques

MITRE ATLAS catalogues adversary techniques against AI systems. Where this OWASP threat has an attacker-perspective counterpart, the ATLAS technique is shown below. That is what a red team would actually be doing on the wire. Use this for detection-signal anchoring, threat-hunting hypotheses, and IR runbooks. Source: mitre-atlas/atlas-data v5.6.0.

AML.T0053 AI Agent Tool Invocation view on ATLAS ↗

Adversary causes an agent to invoke a legitimate tool with attacker-controlled parameters, turning a sanctioned capability into an attack vector.

Agentic angle: Maps directly to OWASP T2 Tool Misuse: the agent's tools are operating within their declared scope, but the chosen invocation is unsafe.

AML.T0081 Modify AI Agent Configuration view on ATLAS ↗

Adversary alters an agent's configuration (system prompt, tool list, allowed actions, persona) to change its behaviour without retraining.

References

OWASP MAS Threat Modelling Guide v1.0 (April 2025) §3 RPA Expense Reimbursement Agent — Layer 3 Agent Frameworks.

Sources

OWASP-MAS-Guide ↗ · 1.0 (Apr 2025) · §3 RPA Expense Reimbursement Agent — Layer 3 Agent Frameworks