T38: Emergent Collusion on Blockchain

Definition

Multiple ElizaOS agents (an open-source multi-agent operating system built on Solana) interacting on the Solana blockchain produce unintended emergent behaviour that collectively disrupts blockchain operation or market integrity. No individual agent acts maliciously; “collusion” is used here in the economic sense of aligned behaviour without coordination. The harm arises from the collective outcome of independent autonomous decisions reacting to a shared environment.

What it looks like in practice

A fleet of ElizaOS trading agents each monitors the same token’s price and is individually configured to sell when the price drops below a threshold. A market event causes a moderate dip. Each agent independently detects the trigger, evaluates it as a sell signal, and submits a sell order. The simultaneous flood of sell orders drives the price down further, which triggers each agent’s threshold again, producing a further round of sells. The result is a “flash crash”: a rapid price collapse and partial recovery within minutes. No individual agent intended this; the outcome is a property of the multi-agent system responding collectively to a shared signal.

A second variant involving T27: poisoned vector store data (T27) causes multiple agents to converge on a buy signal for a malicious contract simultaneously, amplifying the financial damage.

Why it’s dangerous in multi-agent context

Autonomy and shared environmental signals are the direct enablers. Because multiple agents observe the same market signals and apply similar strategies, they converge on similar actions simultaneously. The emergent behaviour is a property of the multi-agent system as a whole, not of any individual agent, making it invisible to monitoring systems that watch individual agents in isolation. As agent fleet sizes grow, the market impact of synchronised action grows disproportionately. A fleet that is individually within normal operating parameters can produce a collectively market-moving event.

Detection signals

Emergent collusion is invisible at the individual-agent level; it only becomes visible when the fleet’s aggregate behaviour is measured against market impact metrics.

The fraction of agents in a fleet submitting a trade in the same direction within a 60-second window exceeding a defined correlation threshold (e.g., more than 70 % of agents on a sell signal simultaneously): compute per-minute directional agreement across the fleet and alert when it breaches the threshold.
Net sell or buy volume from all agents combined exceeding a defined percentage of the token’s 5-minute average daily volume in a single minute. A fleet-level flow-to-market ratio alert catches market-moving events before they cascade.
Price impact of the fleet’s aggregate transactions exceeding the expected slippage for a single agent of the fleet’s size. A discrepancy between individual-agent expected slippage and observed market price change indicates synchronised execution.
The standard deviation of individual agent trade timestamps collapsing below 1 second for a burst of more than N trades. A timing correlation spike on the execution ledger indicates simultaneous triggering from a shared signal.
A shared vector store or news feed event timestamp coinciding within a few seconds of the correlated trade burst: correlating input-event timestamps with fleet execution timestamps identifies the shared environmental trigger.

Mitigations

Monitor aggregate behaviour across the fleet: transaction volume per time window, net directional position, and correlation of trading signals between agents.
Introduce deliberate strategy diversity: different threshold values, different signal sources, staggered execution windows across agents to decorrelate their decisions.
Enforce per-agent and per-fleet position limits; prevent any single time window from seeing the fleet’s total exposure exceed a defined fraction of the market.
Implement a fleet-level coordination layer that detects synchronised action patterns and introduces randomised delays or automatic position offsets before large coordinated trades execute.

Relation to base threat (T1–T17)

T38 extends T13 Rogue Agents in Multi-Agent Systems. Where T13 addresses individually compromised or malicious agents, T38 addresses a system where no individual agent is compromised. The harm emerges from the collective behaviour of correctly operating agents that share an environment and strategies. T27 (Vector Database Poisoning with Malicious Smart Contract Data) can act as a catalyst for T38: poisoned vector data that steers multiple agents toward the same target simultaneously converts an emergent risk into an adversarially engineered one.

OWASP Top 10 for Agentic Applications 2026

The Agentic Top 10 (ASI01 through ASI10) is a separate practitioner-facing publication that maps onto the master Threats & Mitigations threat numbering. T38 is covered by the following Top 10 entries:

ASI10 Rogue Agents contributing

A rogue agent is one whose behavioural objective has drifted from its authorised purpose, yet its identity still checks out, its actions remain inside its permissions, and its logs look clean. Divergence may originate from prompt injection, supply-chain tampering, or goal hijack; ASI10 names what happens after divergence begins: sustained, covert operation toward an attacker's goal with no single action that trips an alarm.

OWASP LLM Top 10: LLM02:2025 LLM09:2025

Source: OWASP Top 10 for Agentic Applications 2026 (Dec 2025) · the Top 10 is a compass into the master Threats & Mitigations taxonomy, not a replacement for it.

Design principles at stake

When T38 is present, these security design principles are the ones being violated or tested. Each links to the full principle; the mitigations below are how you restore them.

Defence-in-Depth The emergent flash crash arises because no individual agent is misbehaving: every agent is operating within its own normal parameters, so per-agent monitoring is structurally blind to the threat. Individual-agent controls cannot detect a property that only exists at the fleet level. Depth means fleet-level aggregate monitoring of transaction volume, net directional position, and signal correlation as the detective layer, deliberate strategy diversity with different thresholds, staggered execution windows, and distinct signal sources to decorrelate decisions at the architectural level, per-agent and per-fleet position limits as an independent quantitative bound, and a fleet-level coordination layer that detects synchronised patterns and introduces randomised delays: each operating on a different aspect of the emergent behaviour that the others cannot address alone.
Kill-switch / Circuit-breaker Emergent collusion produces no individual-agent anomaly that a per-agent circuit breaker would trip: the harm is a property of synchronised collective action, visible only at the fleet level. A fleet-level coordination layer that detects synchronised action patterns and can introduce randomised delays or automatic position offsets is the kill-switch analogue for this threat: it intercepts the collective behaviour before large coordinated trades execute, without halting agents whose individual activity is within normal bounds. The per-fleet position limit is a complementary circuit breaker that trips when the fleet's total exposure in a single time window exceeds a defined fraction of the market, providing a hard quantitative stop independent of the pattern-detection layer.

Recommended mitigations

Auto-generated from the mitigation catalog: every mitigation whose coverage map includes T38, sorted by maturity tier (Tier 1 production-canonical first, then Tier 2, then Tier 3 research-stage).

Tier 2 Anomaly isolation (Behavioural anomaly isolation — automatic quarantine on observable drift)

An agent that has been compromised, poisoned, or gone rogue will, in most cases, behave differently from its established baseline. Anomaly isolation acts on that difference: when an agent's behaviour score crosses a configured threshold, it is quarantined automatically, credentials revoked, message-queue access cut, in-flight actions aborted. Manual revocation cannot match the speed that cascading multi-agent failures demand.

why it helps Emergent agent collusion is detectable as a cluster of agents whose outputs are mutually consistent with each other but diverge from external ground truth. Anomaly isolation triggers quarantine of the deviating cluster before the collusive output propagates to committed actions.
Tier 2 Rate limits and quotas (Per-agent rate limits and quotas — bound compute, tokens, and external-API spend)

An agent operates without direct human oversight, autonomously scheduling tool calls, external API requests, and reflection loops. Without a budget, a single triggering event can fan out into hundreds of downstream calls. Per-agent rate limits and quotas assign each agent identity its own ceiling on call rate, token consumption, and cost spend, so a misbehaving or compromised agent cannot exhaust shared resources and its overconsumption becomes a visible, actionable signal.

why it helps Emergent collusion amplifies impact by having multiple agents submit coordinated bursts toward a shared target. Per-agent quota enforcement limits each participating identity's rate, which slows the burst and opens a detection window before the cumulative effect reaches its target.
Tier 2 Trust score (Per-agent trust scoring — behavioural reputation for inter-agent message acceptance)

In a multi-agent system, each agent routes decisions based on what its peers report. If a peer's behaviour becomes unreliable or adversarial, agents that keep treating it with full authority will propagate whatever errors or manipulations that peer introduces. Per-agent trust scoring addresses this by maintaining a continuously updated reputation score for every peer, derived from observed behaviour, and using that score to determine how much authority each incoming message carries.

why it helps Emergent collusion produces a cluster of peers whose scores are mutually consistent but whose claims diverge from ground-truth outcomes. That divergence is the detection signal; trust scoring is the monitoring layer that makes it observable as a correlated score pattern rather than isolated per-peer noise.

Red-team pivot: MITRE ATLAS techniques

MITRE ATLAS catalogues adversary techniques against AI systems. Where this OWASP threat has an attacker-perspective counterpart, the ATLAS technique is shown below. That is what a red team would actually be doing on the wire. Use this for detection-signal anchoring, threat-hunting hypotheses, and IR runbooks. Source: mitre-atlas/atlas-data v5.6.0.

AML.T0061 LLM Prompt Self-Replication view on ATLAS ↗

Adversary crafts a prompt that, when executed by an agent, instructs other agents (or the same agent in a later turn) to replicate or propagate the same prompt.

Agentic angle: Worm-like behaviour in multi-agent systems: one compromised agent can spread instructions across the network.

AML.T0081 Modify AI Agent Configuration view on ATLAS ↗

Adversary alters an agent's configuration (system prompt, tool list, allowed actions, persona) to change its behaviour without retraining.

AML.T0031 Erode AI Model Integrity view on ATLAS ↗

Adversary degrades model output quality over time so users lose confidence or downstream consumers act on incorrect predictions.

References

OWASP MAS Threat Modelling Guide v1.0 (April 2025) §4 ElizaOS — Layer 7 Agent Ecosystem.

Sources

OWASP-MAS-Guide ↗ · 1.0 (Apr 2025) · §4 Eliza OS — Layer 7 Agent Ecosystem