T19 · Helmwart ID · OWASP MAS Guide source

Unintended Workflow Execution

Extends T2: Tool Misuse · base threat in OWASP v1.1 catalog

Last reviewed 2026-05-14 · Severity heuristic: high

Definition

A Robotic Process Automation (RPA) agent, due to a flaw in its workflow definition within the agent framework, executes steps in an incorrect order or skips critical validation steps entirely. The threat is distinct from T2 Tool Misuse. It is not about misusing a specific tool, but about the incorrect execution of the overall workflow. The workflow definition file or configuration is the attack surface.

What it looks like in practice

An RPA expense reimbursement agent is designed to execute three sequential steps: (1) extract structured data from the claim, (2) validate the data against company policy, (3) submit the approved claim for payment. A bug in the workflow graph (an incorrectly wired conditional branch or a missing guard condition) causes step 2 to be skipped when the claim amount falls below a certain value. Claims under that threshold are submitted directly for payment, bypassing the policy validation gate entirely.

The bug may be latent through normal test cases and surface only on specific input shapes encountered in production, by which time irreversible payment actions have already been committed.

Why it’s dangerous in multi-agent context

Agentic workflows are defined programmatically and executed autonomously, without a human reviewing the step sequence before each run. The gap between intended and actual execution is invisible without step-level audit logging. In a multi-agent pipeline where the RPA agent’s output feeds a payment agent or approval agent, a skipped validation step produces a downstream cascade: the payment agent receives a claim that was never validated and processes it as authorised. State inconsistency between agents (T21) can also compound the effect when the validation agent’s result was never written before the routing agent read it.

Detection signals

The clearest signal for T19 is an absent step-completion event: a workflow audit log where the expected sequence of step identifiers has a gap.

  • A step-level audit log that shows a transition from step_1_complete directly to step_3_complete with no step_2_complete event in between. Alert on any workflow run where a declared mandatory step ID is missing from the event sequence.
  • A claim reaching the payment submission state with no policy_validation_result field in its state record: query the state store on each payment submission and reject if the validation field is absent or null.
  • A spike in throughput for claims below a specific value threshold relative to the prior week’s baseline. A sudden increase in low-value claim volume may indicate the bypass condition is being triggered at scale.
  • A workflow_run record whose total elapsed time is shorter than the minimum expected duration for a three-step pipeline (establish a floor from historical runs; alert on any run completing below the 5th-percentile duration).
  • An exception or warning emitted by the framework’s DAG resolver that is swallowed without propagating to the monitoring system: instrument the graph executor to forward all resolver warnings as structured log events.

Mitigations

  • Enforce explicit workflow schema validation on deployment: define required step ordering and mandatory guard conditions in the framework’s Directed Acyclic Graph (DAG) specification.
  • Emit a step-level audit event for each workflow transition; alert on any transition that skips a declared validation gate.
  • Test edge-case inputs (zero-value claims, maximum-value claims, malformed date fields) during pipeline validation to surface latent branch-skip bugs.
  • Require a human-in-the-loop gate on the first N claims after a workflow definition change before returning to fully automated processing.

Relation to base threat (T1–T17)

T19 extends T2 Tool Misuse. Where T2 focuses on individual tool invocations being misused, T19 operates at the orchestration level: the workflow engine itself executes the wrong sequence, bypassing the validation tools that should have been called. T21 (Inconsistent Workflow State) is the companion threat where the wrong step is taken because state synchronisation fails between agents rather than because the workflow graph is miswired.

OWASP Top 10 for Agentic Applications 2026

The Agentic Top 10 (ASI01 through ASI10) is a separate practitioner-facing publication that maps onto the master Threats & Mitigations threat numbering. T19 is covered by the following Top 10 entries:

  • ASI01 Agent Goal Hijack contributing

    An attacker manipulates an agent's objective, task selection, or decision pathway (via injected prompts, deceptive tool outputs, forged peer messages, or poisoned retrieval data) so that the agent pursues the attacker's goal rather than the operator's. Unlike a single-turn injection, the harm compounds across many authorised steps before any drift is visible.

    OWASP LLM Top 10: LLM01:2025LLM06:2025
  • ASI02 Tool Misuse and Exploitation contributing

    An agent applies authorised tools in ways their operator did not intend, driven by prompt injection, misaligned reasoning, or manipulated tool outputs. Every individual call looks clean; the harm is in the sequence: data exfiltrated via successive reads, workflows hijacked by parameter tampering, or a legitimate API weaponised across turns.

    OWASP LLM Top 10: LLM06:2025

Source: OWASP Top 10 for Agentic Applications 2026 (Dec 2025) · the Top 10 is a compass into the master Threats & Mitigations taxonomy, not a replacement for it.

Design principles at stake

When T19 is present, these security design principles are the ones being violated or tested. Each links to the full principle; the mitigations below are how you restore them.

  • Defence-in-Depth The workflow graph itself is the failure surface: a miswired conditional branch or a missing guard condition silently skips the policy validation step, and the agent continues to the payment submission with no error signal. Depth means the workflow definition is not the only enforcement point: the framework's DAG specification enforces required step ordering and mandatory guard conditions at deployment, a step-level audit event fires on every transition so a skipped validation gate produces an alert rather than silence, and edge-case inputs are exercised during pipeline validation before the workflow reaches production. A human-in-the-loop gate on the first batch of claims after any workflow change ensures that latent branch-skip bugs surface under supervised conditions before full automation resumes.
  • Least Privilege An RPA agent that can proceed directly from data extraction to payment submission, skipping the validation step, holds more effective authority than its intended role permits, because the workflow flaw grants it the authority to approve and commit without the validation credential it was supposed to require. Least privilege means the payment submission step should only be reachable after a validation token is committed; the agent's authority to call the payment API is conditional on receiving that token from the policy gate, not on the workflow simply routing it there. Without that token-gated design, a miswired graph is sufficient to grant payment authority with no policy check.
  • Fail Securely (fail-closed) The threat's canonical failure mode is fail-open: when the workflow graph skips the validation step, the agent defaults to processing the claim rather than halting. A fail-closed design inverts this: any claim that has not received a committed, explicit validation-passed token must be refused by the payment step, not passed through. The policy engine should be configured so that an unresolvable or missing validation result returns a deny decision, and alert-on-missing-transition rules treat the absence of a "validated" status event as a blocking anomaly rather than a silent no-op.

Recommended mitigations

Auto-generated from the mitigation catalog: every mitigation whose coverage map includes T19, sorted by maturity tier (Tier 1 production-canonical first, then Tier 2, then Tier 3 research-stage).

  • Tier 2 Plan check (Plan-vs-goal validation — independently check each proposed step against the original goal)

    A plan-then-execute agent produces a sequence of steps before acting. If the planner is manipulated, it will emit steps that serve the attacker's goal rather than the user's. Plan-vs-goal validation addresses this by placing an independent validator between the planner and the execution loop: it evaluates each proposed step against the originally-declared goal before the agent is permitted to act on it.

    why it helps Unintended Workflow Execution is the triggering of a workflow the user did not sanction, often by constructing a plan step whose effect is a workflow invocation. Plan validation catches this at the step layer: a step that initiates an unsanctioned workflow is evaluated against the declared goal and flagged as goal-divergent before the workflow starts.

  • Tier 2 Policy bound (Policy-bound autonomy — declarative runtime enforcement of the agent's action space)

    An agent's authority is normally bounded only by its own reasoning. If that reasoning is manipulated, or the agent's identity is compromised, it will attempt actions the operator never intended to permit. Policy-bound autonomy addresses this by placing a declarative enforcement point between the agent and every consequential action: a policy engine evaluates the agent identity, the target tool, and the parameter envelope before execution, and the agent cannot reason or argue past the result.

    why it helps Unintended Workflow Execution is an agent initiating a workflow it was not explicitly directed to start. A policy rule that maps each workflow-initiation action to a permitted set of agent roles blocks any agent from triggering a workflow outside its declared scope, so the unauthorised initiation is refused at the enforcement point rather than discovered after the fact.

  • Tier 3 Workflow state consistency (Workflow state consistency — distributed-state integrity checks for multi-agent workflows)

    When multiple agents read and write shared workflow state concurrently, a network partition, a delayed message, or an adversarially timed race condition can produce divergent views. An agent acting on stale or conflicting state may authorise an action it would reject given correct current state. Hash-chained state snapshots, merge-point conflict detection, and optimistic concurrency control close that window.

    why it helps Unintended workflow execution allows a step to be advanced before its required predecessor state is complete. An explicit state-transition gate at each step means a workflow step cannot be initiated until the predecessor state has been committed and verified, blocking the jump-ahead scenario.

Red-team pivot: MITRE ATLAS techniques

MITRE ATLAS catalogues adversary techniques against AI systems. Where this OWASP threat has an attacker-perspective counterpart, the ATLAS technique is shown below. That is what a red team would actually be doing on the wire. Use this for detection-signal anchoring, threat-hunting hypotheses, and IR runbooks. Source: mitre-atlas/atlas-data v5.6.0.

AML.T0053 AI Agent Tool Invocation view on ATLAS ↗

Adversary causes an agent to invoke a legitimate tool with attacker-controlled parameters, turning a sanctioned capability into an attack vector.

Agentic angle: Maps directly to OWASP T2 Tool Misuse: the agent's tools are operating within their declared scope, but the chosen invocation is unsafe.

AML.T0081 Modify AI Agent Configuration view on ATLAS ↗

Adversary alters an agent's configuration (system prompt, tool list, allowed actions, persona) to change its behaviour without retraining.

AML.T0067 LLM Trusted Output Components Manipulation view on ATLAS ↗

Adversary manipulates the structured parts of an LLM response (citations, tool-call arguments, approved-action markup) that downstream systems treat as trusted.

Agentic angle: Structured outputs are exactly what agent frameworks parse to decide what to execute. Undermining the structure undermines every safety check downstream.

References

Sources