MITIGATION · m-vector-acl
Permission-aware vector retrieval — ACLs at the retrieval boundary
A vector store returns results by embedding-space proximity, not by who is asking. Without a per-principal filter applied before similarity ranking, a query from tenant A can surface tenant B's vectors if the embeddings are close enough. Vector ACL closes that gap: every retrieval call is scoped to the requesting principal's namespace or payload partition before the store ranks any results, so cross-principal hits are structurally impossible rather than merely unlikely.
At a glance
TL;DR
- Embedding-space proximity is not a security boundary. A vector store without per-principal filtering will return tenant B's data to tenant A if the query embedding is close enough, with no error or signal.
- Apply the ACL inside the retrieval call itself: scope every query to the requesting principal's namespace or payload partition before similarity ranking runs, not as a post-filter on results.
- Tag every write with the producing principal at ingest time. A vector that enters the store without a principal tag cannot be correctly scoped at retrieval and there is no retroactive fix.
- This control addresses the retrieval boundary. Pair it with write-side controls (m-shared-memory-acl) to guard against same-principal poisoning, which retrieval-time ACL alone cannot prevent.
How it behaves
What it is
A vector store retrieves documents by computing the similarity between a query embedding and every stored vector, then returning the closest matches. That ranking is a mathematical operation over the full index; it has no awareness of who is asking or what they are permitted to read. In a multitenant RAG system, this property is a vulnerability: if tenant A's query embedding happens to be geometrically close to tenant B's stored vectors, the store will return tenant B's documents. Embedding-space proximity is not a security boundary.
The control is straightforward: scope every retrieval call to the requesting principal before the similarity ranking runs. In practice this means tagging every vector at ingest time with the producing principal's identifier, then constructing every query with a mandatory filter that restricts the search space to the requesting principal's namespace or payload partition. The store evaluates the filter first; only vectors that pass it are ranked by similarity. A vector in a different namespace is never ranked, never returned, and never visible to the querying principal, regardless of how similar its embedding is.
This must be applied at the retrieval call itself, not as a post-filter on results. A post-filter allows the store to retrieve cross-principal vectors and then discard them at the application layer, which still exposes those vectors to the calling code and to any logging that sits between the store and the filter. The filter belongs inside the query, before any results are produced.
Three implementation layers work together:
- Tenant and namespace partitioning. Every write to the vector store is tagged with the producing principal; every read filters on the requesting principal. Production primitives: Pinecone namespaces, Weaviate multi-tenancy, Qdrant payload filter with
group_id, pgvector row-level security. - Metadata ACLs at query time. Layer a metadata filter on every query (
tenant_id == request.tenant,owner == request.user_id). pgvector users get this via PostgreSQL row-level security policies that evaluatecurrent_setting('app.tenant_id')before returning any row. - Policy engine for cross-cutting rules. Where access depends on attributes beyond tenant identity (sensitivity classification, regulatory boundary, role), use OPA or a cloud policy engine to make the access decision before the retrieval call is issued.
Detection signals
- Cross-principal retrieval denials per query period. A sustained or rising count indicates a misconfigured filter, a missing principal tag on ingested vectors, or an active attempt to probe partition boundaries.
- Retrieval success rate per tenant. A sudden drop for a specific tenant points to a namespace misconfiguration that is silently excluding that tenant's own vectors.
Threats it covers
-
WHY IT HELPS T1 Shared Memory Poisoning names the scenario where a vector written into one tenant's partition is later retrieved by a different tenant through embedding-space proximity. Per-principal namespace partitioning makes that retrieval structurally impossible: a query scoped to the requesting principal's namespace cannot return vectors written to a different namespace, regardless of similarity score.
-
WHY IT HELPS T18 RAG Input Manipulation describes an attacker injecting adversarial vectors into a shared retrieval corpus to influence a target agent's context. Scoping every query to the requesting principal's namespace means the attacker's injected vectors must reside in that same namespace to reach the agent, which requires write access to the target's namespace rather than to any shared partition.
-
WHY IT HELPS T27 involves malicious embeddings written to one namespace being retrieved by agents operating in a different namespace. Retrieval-boundary namespace isolation prevents cross-namespace reads by construction: an agent querying its own namespace cannot receive vectors written to another namespace, regardless of embedding proximity.
-
WHY IT HELPS T28 RAG Data Exfiltration relies on a principal retrieving vectors from a partition they should not access. Per-principal ACL at the retrieval boundary prevents this structurally: a query is scoped to the requesting principal's namespace before similarity ranking, so vectors from other namespaces are never ranked or returned, even when the query embedding is highly similar to their content.
-
WHY IT HELPS T49 involves a semantically drifted or poisoned corpus spreading its influence across retrieval results for multiple principals. Namespace isolation contains that drift within the affected namespace: a drifted corpus in one namespace cannot propagate to other principals' retrieval results through embedding-space proximity.
Principle coverage
Defence-in-Depth stage: Prevent — and it advances:
- Microsegmentation Microsegmentation partitions a shared resource so that components in one segment cannot reach those in another. Vector ACL applies that principle to the retrieval layer: each principal's vectors are held in a separate namespace or shard, and the retrieval boundary enforces the segment wall so that a query from one principal cannot traverse into another's partition.
- Memory & RAG Integrity Memory integrity requires that what an agent retrieves is what the correct principal stored. Vector ACL enforces that at the retrieval boundary: by scoping every query to the requesting principal's namespace before similarity ranking, the store cannot return vectors written by a different principal, regardless of embedding proximity.
- Data Minimization & Privacy Data minimization limits what an agent retrieves to what its task legitimately requires. Vector ACL applies that principle at the retrieval boundary: each query is scoped to the requesting principal's namespace, so the agent never receives vectors belonging to other principals that it has no legitimate reason to access.
- Least Common Mechanism Least Common Mechanism limits how much shared infrastructure is available across principals. Vector ACL reduces sharing at the retrieval layer by partitioning the vector index into per-principal namespaces or shards, so each principal's retrieval path operates on a private segment rather than a common pool of ranked results.
Design & governance principles (open design, economy of mechanism, accountability, …) are architectural, not advanced by a single placed control.
Implementation options
Choose the primitive that matches your vector store. All four options below are verified against current upstream documentation.
Pinecone namespaces Pinecone namespaces partition an index into isolated segments; queries scoped to a namespace cannot return vectors from other namespaces. Layer a per-query metadata filter ($eq on principal_id) for sub-namespace principal scoping within a namespace.
Why choose it: Namespace isolation is enforced by the Pinecone API: a query issued to namespace A cannot return vectors from namespace B. Metadata filtering adds fine-grained ACL within a namespace. Pinecone has no native RBAC; the filter must be constructed from the verified principal identity in application code before the query is dispatched.
More details:
Weaviate multi-tenancy Weaviate multi-tenancy stores each tenant on a separate shard. Data in one tenant shard is not accessible from another. All CRUD and search operations require an explicit .with_tenant("tenantA") call; there is no path to issue a cross-tenant query through the standard API.
Why choose it: Shard separation is architectural, not application-level: the retrieval path is scoped before any query planner runs. This provides stronger structural isolation than a payload filter in a shared collection. The cost is per-tenant shard overhead, which is material at thousands of tenants.
More details:
Qdrant payload filter Qdrant recommends payload-based partitioning within a single collection. Ingest tags each point with a group_id payload field; every query wraps a must: [{key: "group_id", match: {value: tenantId}}] filter clause. Qdrant v1.16.0+ supports tiered multitenancy with dedicated shards for high-volume tenants.
Why choose it: Application-level isolation only: the filter must be constructed and passed by the calling code. Qdrant has no native ACL or RBAC; the isolation guarantee is only as strong as the application's discipline in always constructing the correct filter from a verified principal identity. For large tenants, set payload_m: 16 and m: 0 in HNSW config to build per-group indexes.
More details:
pgvector + PostgreSQL RLS For teams using pgvector as their embedding store, PostgreSQL row-level security enforces tenant isolation at the database engine level. Enable RLS on the embeddings table, create a policy using USING (tenant_id = current_setting('app.tenant_id')::uuid), and set SET LOCAL app.tenant_id = $tenantId at the start of each agent transaction.
Why choose it: RLS is enforced by the database engine, not application code, making it the most robust form of retrieval-time isolation for SQL-backed vector stores. Key requirement: use ALTER TABLE embeddings FORCE ROW LEVEL SECURITY to enforce the policy for all roles including table owners, who bypass RLS by default.
More details:
Trade-offs
- Latency impact is low: namespace and shard scoping is sub-millisecond; a payload filter on an indexed field adds 1-5 ms. RLS policy evaluation is negligible at typical agent query rates.
- Dev effort is medium. The primitives are available in every major store, but correctly tagging every write path with the right principal is where teams fail. A system with multiple ingestion pipelines (API, bulk import, streaming events, agent self-write, scheduled sync) needs each path audited independently; one untagged path produces silently unprotected vectors.
When NOT to use
- Single-tenant stores. If the vector store holds one principal's data exclusively, there is no cross-principal exposure risk; protect the store with network-level access control and API authentication instead.
- When sensitivity varies within a single tenant's partition (for example, confidential records alongside public content in the same namespace). Per-tenant namespacing alone is insufficient here; you need attribute-based retrieval filtering on a sensitivity classification field.
Limitations
- ACL-at-retrieval defends against cross-principal exposure. It does not defend against same-principal poisoning: a malicious vector written into the correct namespace by the correct principal is fully retrievable. Pair with write-side validation (m-shared-memory-acl).
- Misconfiguration is silent. When the filter is absent or incorrect, the agent receives results without any error signal. The only reliable detection mechanism is an offline retrieval audit that checks whether any successful retrieval crossed a principal boundary.
Maturity tier reasoning
- Tier 2 (real-composable). Every major vector store ships the required primitives; PostgreSQL RLS is decades-old. OWASP LLM08:2025 and NIST AI 600-1 MS-2.6 name this control class explicitly.
- The gap keeping this from Tier 1 is operational: no industry-standard tooling audits whether every ingestion path tagged vectors correctly. Expect promotion as vendors ship default-on multi-tenancy with mandatory principal tagging at ingest.
Last verified against upstream docs: 2026-05-30.