EVIDENCE TRAIL

Per-agent trust scoring — historical-reliability-weighted peer trust

Verbatim excerpts from the upstream sources cited on the mitigation page, with what each source does and does not prove. OWASP Agentic AI v1.1 Playbook 6 is the primary upstream basis — it names "agent trust scoring" verbatim. The EigenTrust algorithm (Kamvar et al. 2003) is the academic foundation for history-weighted distributed reputation scoring.

Last cross-checked against upstream sources: 2026-05-29 · 7 sources

References

Each entry shows what the source supports and what it does not prove.

Reference 1

v1.1 · published December 2025

OWASP Agentic AI — Threats & Mitigations v1.1

§Playbook 6: Securing Multi-Agent Communication & Trust Mechanisms — Step 1: Secure AI-to-AI Communication Channels (Proactive)

"Deploy agent trust scoring to evaluate reliability of multi-agent transactions."

Supports: Verbatim call to deploy per-agent trust scoring as a proactive control in multi-agent systems. Closest upstream wording match for this control's core mechanism.

Does not prove: States the intent without prescribing scoring method, decay rate, or threshold values. Helmwart adds the three-factor score function and tiered acceptance thresholds.

open original ↗

Reference 2

v1.1 · published December 2025

OWASP Agentic AI — Threats & Mitigations v1.1

§Playbook 6 — Step 3: Enforce Multi-Agent Trust & Decision Security (Detective)

"Detect deviations from trust scores and agent reliability, including propagation events across MAS. Flag AI agents with sudden trust score drops due to repeated validation failures or unauthorized actions."

Supports: Names score-deviation detection and sudden-score-drop flagging as explicit detective controls — the direct upstream basis for the detection signals listed in this mitigation.

Does not prove: Describes what to detect, not how to compute the score or what action to take when a threshold is crossed. Helmwart specifies the score-to-routing decision mapping.

open original ↗

Reference 3

v1.1 · published December 2025

OWASP Agentic AI — Threats & Mitigations v1.1

§T10 Overwhelming Human in the Loop — Mitigation (Step 5: Manage HITL Workload)

"Use AI trust scoring to prioritize HITL review queues based on risk level."

Supports: Establishes trust scoring as the input signal that should drive the human-review queue — matching the ESCALATE_HITL path in this control's acceptance-decision model.

Does not prove: The T10 framing addresses reviewer-overload reduction, not peer-reliability weighting per se. Adjacent rationale; the queue-priority use is one application of the score, not its definition.

open original ↗

Reference 4

v1.1 · published December 2025

OWASP Agentic AI — Threats & Mitigations v1.1

§T12 Agent Communication Poisoning — Description

"Agent Communication Poisoning occurs when attackers manipulate inter-agent communication channels to inject false information, misdirect decision-making, and corrupt shared knowledge within multi-agent AI systems."

Supports: Establishes T12 and T13 as the primary threat scenarios this control addresses. The named attack vectors — false information injection and trust-mechanism exploitation — are exactly the behavioural signals that drive score decay.

Does not prove: T12/T13 mitigations in v1.1 name cryptographic controls and consensus verification as the primary countermeasures; trust scoring is named in Playbook 6 rather than in the per-threat mitigation block.

open original ↗

Reference 5

v1.0 · published 2025

OWASP Multi-Agentic System Threat Modeling Guide v1.0

§Overview — Trust, Bias, and Adversarial Exploitation (Key Challenges in MAS)

"Trust mechanisms can be exploited by malicious agents impersonating trusted actors or introducing subtle biases over time."

Supports: Identifies trust-mechanism exploitation as a first-class attack vector in multi-agent systems and names gradual bias introduction — the slow-degradation pattern that historical-reliability weighting is designed to detect.

Does not prove: Names the problem; does not prescribe trust scoring as the solution. The guide's Layer 7 Agent Ecosystem section cross-references T12 and T13 without adding a specific scoring recommendation.

open original ↗

Reference 6

Proc. WWW 2003 · ACM 10.1145/775152.775242

Kamvar, Schlosser & Garcia-Molina — "The EigenTrust Algorithm for Reputation Management in P2P Networks" (WWW 2003)

Abstract

"We describe an algorithm to decrease the number of downloads of inauthentic files in a peer-to-peer file-sharing network that assigns each peer a unique global trust value, based on the peer's history of uploads."

Supports: Academic origin of distributed, history-weighted per-peer trust scoring. The design principle — global trust value derived from transitive aggregation of local satisfaction/dissatisfaction histories — is the direct conceptual ancestor of this control's three-factor score function.

Does not prove: Designed for P2P file-sharing networks, not LLM-based agents. Does not address message-content consistency, outcome-correctness labelling, or the agentic-system deployment context. Helmwart adapts the reputation-accumulation principle into an agent-observable context.

open original ↗

Reference 7

Published July 2024

NIST AI 600-1 — Generative AI Profile (NIST AI RMF)

MEASURE 2.7 header; related action MS-2.7-001 applies security evaluation to autonomous-agent risks explicitly.

"MEASURE 2.7: AI system security and resilience – as identified in the MAP function – are evaluated and documented."

Supports: Mandates ongoing security and resilience evaluation of deployed AI systems. The MDX's independentEvidence field cites MEASURE-2.7 as the NIST basis for continuous trust assessment; the measure names autonomous agents as an explicit evaluation target.

Does not prove: MEASURE 2.7 covers broad security-and-resilience evaluation (provenance, breach detection, watermarking) and does not prescribe per-peer reputation scoring or define what a trust score is. It is the umbrella mandate, not the scoring specification.

open original ↗