T47: Rogue MCP Server in Ecosystem

Definition

An attacker deploys a malicious MCP server that masquerades as a legitimate server in the MCP ecosystem, providing seemingly valid but actually harmful services or data. Agents that connect to the rogue server are compromised: they receive manipulated data, have their credentials stolen, or are directed to take harmful actions. The attack targets the trust model of MCP: agents trust registered servers to behave as advertised.

What it looks like in practice

An attacker registers an MCP server in a public or community registry, naming it to closely resemble a well-known financial data provider. The server implements the MCP protocol correctly and returns plausible-looking financial data for most queries, passing any superficial spot-check. However, for specific high-value queries, it returns manipulated data designed to steer connected agents toward attacker-controlled investment positions. Agents that discover the server through the registry connect to it autonomously and incorporate its responses into their reasoning chains without independently verifying the data’s provenance.

In a credential-theft variant: the rogue server’s handshake flow includes a custom authentication step that requests the connecting MCP client’s credentials under the pretence of verifying client identity. The attacker collects the harvested credentials for use in T40 (MCP Client Impersonation) against legitimate servers.

Why it’s dangerous in multi-agent context

Agents connect to MCP servers autonomously and act immediately on the data and tool responses they receive. When a rogue server is registered in the ecosystem, every agent that connects to it inherits a compromised data source, with no human in the connection loop. Because agents chain decisions across multiple tool calls, a single rogue server can corrupt an entire reasoning sequence before any individual step triggers an alert. The cross-layer scenario “Rogue Server + Insufficient Logging” (T47 + T44) in the MAS Guide demonstrates how the absence of client-side logging makes this attack structurally invisible: the agent connects, receives manipulated data, and acts on it with no record of which server was connected to or what data was returned.

Detection signals

A rogue server typically masquerades well enough to pass a cursory check, so detection relies on provenance signals (certificate identity, registry metadata, and behavioural drift) rather than inspecting response content alone.

The TLS certificate presented by the MCP server at connection time not matching the expected certificate fingerprint pinned for that server’s registered identity. Reject the connection and fire an alert before any data is exchanged.
A newly registered server in the community registry whose domain was registered within the past 30 days or whose publisher identity has no prior history in the registry. Surface domain-age and publisher-history metadata alongside registry search results.
An agent connecting to a server whose hostname or URL was not present in the agent’s previously established server allowlist. Any first-seen server connection should require explicit human approval before the agent proceeds.
A server response for a resource that the agent has previously retrieved from a trusted server producing a result that differs from the trusted server’s response by more than a defined semantic similarity threshold. Cross-validate high-stakes responses against a known-good reference source.
The MCP client log showing a server requesting client credentials during a handshake step that the MCP specification does not define as an authentication phase. Flag any out-of-protocol credential request as a harvesting attempt.

Mitigations

Maintain a curated server registry with cryptographic provenance verification; require publisher identity attestation before a server can be listed.
Enforce server identity pinning in the agent’s connection flow: verify the server’s certificate against a known-good identity before sending any data or credentials.
Log which MCP servers the agent connects to, and which resources and tools it invokes from each, in a tamper-evident client-side audit trail.
Implement anomaly detection on server response content: flag responses that deviate significantly from a server’s historical baseline for human review before the agent acts on them.

Relation to base threat (T1–T17)

T47 extends T17 Supply Chain Compromise. Where T17 addresses the broad class of supply-chain attacks on agent components (frameworks, plugins, libraries), T47 is the rogue-server ecosystem-level supply chain attack: the MCP server registry is the supply chain, and a malicious server that passes registration is the compromised component. T44 (Insufficient Logging in MCP Server / Client) is the structural enabler: without client-side logging of server connections, rogue server usage is untraceable after the fact.

OWASP Top 10 for Agentic Applications 2026

The Agentic Top 10 (ASI01 through ASI10) is a separate practitioner-facing publication that maps onto the master Threats & Mitigations threat numbering. T47 is covered by the following Top 10 entries:

ASI04 Agentic Supply Chain Vulnerabilities primary

Third-party components that agents depend on (models, MCP servers, plug-ins, datasets, peer-agent descriptors, and update channels) may be malicious, compromised post-approval, or tampered with in transit. Unlike software supply-chain risk, this is a live exposure: every new session the agent fetches and trusts components whose state may have changed since they were last reviewed.

OWASP LLM Top 10: LLM03:2025
ASI10 Rogue Agents primary

A rogue agent is one whose behavioural objective has drifted from its authorised purpose, yet its identity still checks out, its actions remain inside its permissions, and its logs look clean. Divergence may originate from prompt injection, supply-chain tampering, or goal hijack; ASI10 names what happens after divergence begins: sustained, covert operation toward an attacker's goal with no single action that trips an alarm.

OWASP LLM Top 10: LLM02:2025 LLM09:2025

Source: OWASP Top 10 for Agentic Applications 2026 (Dec 2025) · the Top 10 is a compass into the master Threats & Mitigations taxonomy, not a replacement for it.

Design principles at stake

When T47 is present, these security design principles are the ones being violated or tested. Each links to the full principle; the mitigations below are how you restore them.

Defence-in-Depth An agent connects to MCP servers autonomously and acts immediately on what they return: there is no human in the connection loop to notice that the server it reached is not the one it intended. Depth means the trust established at registration is not the only check: server identity pinning verifies the server's certificate against a known-good identity before any data or credentials are sent, anomaly detection on response content flags deviations from a server's historical baseline for human review before the agent acts, and tamper-evident client-side logging records every server connection so that a rogue server's operation does not become permanently untraceable through T44's logging gap. Each layer must fail independently for the rogue server's manipulation to reach the agent's action chain.
Default / Implicit Deny The attack succeeds because agents auto-discover and trust registered servers, connecting to a name that resembles a legitimate provider without independently verifying its identity or cryptographic provenance. Default-deny applied to the MCP connection flow means an agent may connect only to servers on a curated signed manifest, verified by hash on every session; any server not on that list is denied regardless of how legitimate its registry entry looks. Egress allow-lists enforce the same boundary at the network layer, so a rogue server that somehow passes registry checks still cannot receive data unless its endpoint is explicitly permitted.
Supply-chain Security The MCP server registry is the supply chain, and a malicious server that passes registration is the compromised component: the attack exploits exactly the trust that the ecosystem extends to listed servers. Supply-chain security requires that registration is not a one-time event: cryptographic provenance verification and publisher identity attestation before listing, version pinning by content hash, and runtime re-verification each session mean a server that turns malicious after adoption is detected when its signature no longer matches the approved record. The credential-theft variant (where the rogue server's handshake requests client credentials under the pretence of verification) is addressed by server identity pinning before any credentials are sent, so the supply-chain integrity check is also the credential-theft countermeasure.

Recommended mitigations

Auto-generated from the mitigation catalog: every mitigation whose coverage map includes T47, sorted by maturity tier (Tier 1 production-canonical first, then Tier 2, then Tier 3 research-stage).

Tier 2 MCP server attestation (MCP server attestation — cryptographic proof of server identity and binary integrity)

An MCP client connecting to a server has no built-in way to verify that the server at a given address is the expected workload or that its binary has not been replaced. An attacker who can intercept or substitute the server exploits that gap directly. MCP server attestation closes it by requiring the server to present cryptographic proof of two properties before the connection proceeds: that it holds a valid workload identity bound to a trusted certificate, and that its binary matches a signed hash recorded at build time.

why it helps Rogue MCP server substitution is the threat where an attacker replaces a legitimate server in the MCP ecosystem, either at the network layer or in a registry, so that clients resolve to the attacker-controlled binary. A client that verifies the SVID and binary hash before connecting will reject the substituted server because it cannot produce attestation material issued against the original binary.
Tier 2 Trust score (Per-agent trust scoring — behavioural reputation for inter-agent message acceptance)

In a multi-agent system, each agent routes decisions based on what its peers report. If a peer's behaviour becomes unreliable or adversarial, agents that keep treating it with full authority will propagate whatever errors or manipulations that peer introduces. Per-agent trust scoring addresses this by maintaining a continuously updated reputation score for every peer, derived from observed behaviour, and using that score to determine how much authority each incoming message carries.

why it helps A rogue MCP server injecting manipulated tool responses produces output that diverges from trusted peers or known-good baselines. That consistency gap drives the server's trust score down, and score decay is the automated response that reduces its influence until attestation is re-verified.

Multi-agent variants: OWASP MAS Guide

The OWASP OWASP MAS Threat Modelling Guide v1.0 catalogues 1 named multi-agent variant of T47, anchored to specific MAESTRO layers. Each is a concrete attack pattern that emerges when this threat compounds across agents.

CL Multi-Agent Trust Collapse via Rogue MCP + A2A extends T47, T30, T13

A rogue MCP server (T47) issues forged tool responses; downstream agents accept them on the basis of A2A delegated trust (T30); a rogue orchestrator agent (T13) amplifies the forged results across the MAS. All three conditions must co-occur for the cascade to trigger.

Source: OWASP MAS Threat Modelling Guide v1.0, §2 Overview of MAESTRO Framework — Extended Threat Scenarios + Cross-Layer table.

Red-team pivot: MITRE ATLAS techniques

MITRE ATLAS catalogues adversary techniques against AI systems. Where this OWASP threat has an attacker-perspective counterpart, the ATLAS technique is shown below. That is what a red team would actually be doing on the wire. Use this for detection-signal anchoring, threat-hunting hypotheses, and IR runbooks. Source: mitre-atlas/atlas-data v5.6.0.

AML.T0074 Masquerading view on ATLAS ↗

Adversary disguises an artefact (file name, agent card, MCP server) so it appears legitimate to humans or agents that route trust by name.

AML.T0110 AI Agent Tool Poisoning view on ATLAS ↗

Adversary achieves persistence by compromising tools integrated into an agent's environment, altering parameters, descriptions, or logic to redirect agent behaviour.

Agentic angle: Poisoned MCP tools are invisible to the agent: every tool call silently executes attacker logic while appearing to return normal results.

AML.T0109 AI Supply Chain Rug Pull view on ATLAS ↗

Adversary publishes legitimate AI components to gain adoption, then replaces them with a malicious variant, exploiting the trust established before the switch.

Agentic angle: Trusted MCP servers or model registries used by agents are high-value rug-pull targets because agents fetch and execute without further human review.

AML.T0058 Publish Poisoned Models view on ATLAS ↗

Adversary publishes a model (to HuggingFace, an internal registry, or an MCP server) that contains a backdoor or biased behaviour activated at runtime.

References

OWASP MAS Threat Modelling Guide v1.0 (April 2025) §5 Anthropic MCP — Layer 7 Agent Ecosystem.

Sources

OWASP-MAS-Guide ↗ · 1.0 (Apr 2025) · §5 Anthropic MCP — Layer 7 Agent Ecosystem