T35: Manipulation of Proof of Sampling (PoSP)

Definition

ElizaOS (an open-source multi-agent operating system built on Solana) uses Proof of Sampling (PoSP) for verifiable inference outputs, a mechanism to cryptographically attest that an agent’s inference was performed legitimately. An attacker manipulates the PoSP mechanism to fabricate evidence of legitimate actions, or to conceal malicious ones. The attack targets ElizaOS’s observability and verifiability layer rather than its operational logic directly.

What it looks like in practice

An ElizaOS agent performs an unauthorised fund transfer. The attacker retrieves a valid PoSP proof that was generated during a prior legitimate inference. The attacker replays this valid proof against the fraudulent transfer record, associating the genuine proof with the malicious action. Audit systems that verify PoSP proofs before accepting action records as legitimate accept the replayed proof and classify the fraudulent transfer as a verified legitimate inference. The malicious transfer is now invisible to all downstream consumers that rely on PoSP as the primary trust signal.

Why it’s dangerous in multi-agent context

PoSP is the primary verification mechanism that downstream consumers (other agents, human reviewers, and audit systems) rely upon to establish trust in an ElizaOS agent’s outputs. If PoSP data can be falsified, the entire trust chain built upon it collapses silently. Malicious actions that pass PoSP verification become undetectable through the mechanism designed to catch them. In a multi-agent system where one agent’s PoSP-verified outputs serve as inputs to a second agent’s reasoning, the corruption propagates through the verification chain rather than being filtered by it. T23 (Selective Log Manipulation) addresses a related cover-up at the log layer; T35 is the deeper attack on the cryptographic verification layer itself.

Detection signals

Because the attack replays a valid proof against a different action record, the seam is the mismatch between proof metadata and the action it is asserted to attest.

A PoSP proof timestamp that pre-dates the action record it is bound to: any proof-to-action temporal inversion in the audit log is a replay indicator.
The same proof nonce or proof hash appearing more than once across different action records in the PoSP ledger: duplicates signal replay, since each legitimate inference generates a unique proof.
An action record that carries a valid PoSP proof but whose session nonce does not match the session that produced the proof: requires binding the session identifier into the proof at generation time, then checking that binding at verification.
A gap between the action count recorded in the agent’s application-layer event log and the count of PoSP-verified records: unverified actions recorded in one log but not the other indicate either missing proofs or injected records.
Verification latency spikes on the PoSP verifier: a batch of replayed proofs verified out of normal chronological order may cause the verifier’s chaining check to stall, producing measurable latency anomalies.

Mitigations

Store PoSP sampling records in a tamper-evident, append-only structure with cryptographic chaining between records; prevent any individual record from being replaced without breaking the chain.
Bind each PoSP proof to the specific action it attests (action hash, timestamp, session nonce) so that a proof generated for one action cannot be replayed against another.
Perform PoSP verification through an independent party rather than the agent process that generated the proof; the agent must not be able to self-certify its own outputs.
Maintain a secondary audit trail that does not depend on PoSP data (for example, a separate append-only event log) to provide independent forensic evidence when PoSP integrity is in question.

Relation to base threat (T1–T17)

T35 extends T8 Repudiation and Untraceability. Where T8 covers the general class of evidence suppression, T35 is the cryptographic-verification-mechanism variant: the attacker does not delete records but instead makes fraudulent records appear verified by abusing the proof replay mechanism. T23 (Selective Log Manipulation) is the log-layer complement: both threats eliminate the audit trail for malicious actions, but at different layers of the observability stack.

OWASP Top 10 for Agentic Applications 2026

The Agentic Top 10 (ASI01 through ASI10) is a separate practitioner-facing publication that maps onto the master Threats & Mitigations threat numbering. T35 is covered by the following Top 10 entries:

ASI09 Human-Agent Trust Exploitation primary

Adversaries exploit the tendency of humans to trust fluent, authoritative-sounding agents: an agent presents plausible justification for a harmful action, the human approves it, and the resulting audit trail reads as deliberate human authorisation. The attack surface is the review step itself: human-in-the-loop oversight becomes the vector when reviewers lack the context, time, or authority to challenge what the agent recommends.

OWASP LLM Top 10: LLM01:2025 LLM05:2025 LLM06:2025 LLM09:2025
ASI08 Cascading Failures contributing

A single low-severity fault (a hallucinated value, a corrupted tool output, a poisoned memory entry) propagates across a network of agents that each build on the last agent's output, compounding into system-wide harm that is disproportionate to the original defect. ASI08 is about propagation and amplification, not the fault's origin; the initial trigger may itself be innocuous.

OWASP LLM Top 10: LLM01:2025 LLM04:2025 LLM06:2025

Source: OWASP Top 10 for Agentic Applications 2026 (Dec 2025) · the Top 10 is a compass into the master Threats & Mitigations taxonomy, not a replacement for it.

Design principles at stake

When T35 is present, these security design principles are the ones being violated or tested. Each links to the full principle; the mitigations below are how you restore them.

Defence-in-Depth A replayed PoSP proof succeeds because the proof is not bound to the specific action it attests: the verification mechanism shares a single point of failure with the action record it is supposed to validate. Relying solely on PoSP verification as the trust signal means the entire trust chain collapses silently once the replay is accepted. Depth means cryptographic chaining of sampling records in an append-only tamper-evident structure so that replacing any individual record breaks the chain, binding each proof to its action's hash and session nonce so a proof generated for one action cannot verify another, performing verification through an independent party rather than the agent process that generated the proof, and maintaining a secondary append-only event log that does not depend on PoSP data: four independent layers of evidence, each requiring separate compromise.

Recommended mitigations

Auto-generated from the mitigation catalog: every mitigation whose coverage map includes T35, sorted by maturity tier (Tier 1 production-canonical first, then Tier 2, then Tier 3 research-stage).

Tier 2 Anomaly isolation (Behavioural anomaly isolation — automatic quarantine on observable drift)

An agent that has been compromised, poisoned, or gone rogue will, in most cases, behave differently from its established baseline. Anomaly isolation acts on that difference: when an agent's behaviour score crosses a configured threshold, it is quarantined automatically, credentials revoked, message-queue access cut, in-flight actions aborted. Manual revocation cannot match the speed that cascading multi-agent failures demand.

why it helps PoSP Manipulation attempts to tamper with proof-of-service-and-performance verification records. Anomaly isolation detects behavioural deviation in the verification-reporting agent, unexpected claim patterns or inconsistency with peer attestations, and quarantines the deviating agent before it can suppress or falsify PoSP outcomes.
Tier 2 Split actor (Separation of actor and recorder — different identities for action and audit)

An agent that writes its own audit log can omit, alter, or suppress any record of its own actions. This is not a theoretical risk: an attacker who controls the acting identity controls the evidence. Actor/recorder separation is the structural fix. The identity that performs an action and the identity that records it are different principals, with non-overlapping permissions, so no single compromise can both execute and erase.

why it helps PoSP Manipulation depends on an attacker being able to falsify or suppress verification records produced by the PoSP-reporting agent. Actor/recorder separation ensures the agent that submits PoSP claims cannot write to its own attestation log: the recorder identity captures every claim independently, making retroactive falsification of the verification record structurally infeasible.

Red-team pivot: MITRE ATLAS techniques

MITRE ATLAS catalogues adversary techniques against AI systems. Where this OWASP threat has an attacker-perspective counterpart, the ATLAS technique is shown below. That is what a red team would actually be doing on the wire. Use this for detection-signal anchoring, threat-hunting hypotheses, and IR runbooks. Source: mitre-atlas/atlas-data v5.6.0.

AML.T0081 Modify AI Agent Configuration view on ATLAS ↗

Adversary alters an agent's configuration (system prompt, tool list, allowed actions, persona) to change its behaviour without retraining.

AML.T0067 LLM Trusted Output Components Manipulation view on ATLAS ↗

Adversary manipulates the structured parts of an LLM response (citations, tool-call arguments, approved-action markup) that downstream systems treat as trusted.

Agentic angle: Structured outputs are exactly what agent frameworks parse to decide what to execute. Undermining the structure undermines every safety check downstream.

References

OWASP MAS Threat Modelling Guide v1.0 (April 2025) §4 ElizaOS — Layer 5 Evaluation and Observability.

Sources

OWASP-MAS-Guide ↗ · 1.0 (Apr 2025) · §4 Eliza OS — Layer 5 Evaluation and Observability