Post-filtering vector search leaks data

You stand up multimodal search, it works, you put real users on it. Now user A's query must never return user B's documents. The obvious move is to retrieve, then filter the results in your app before rendering. That move is a data leak, and it breaks pagination, and it gets worse the bigger your index gets.

I wired OpenFGA up to Mixpeek to do the check inside retrieval instead. Same query, two users, two different correctly-scoped result sets, no filtering in the client. The whole thing is a forkable repo you can run end to end. Here's what I learned building it.

Why post-filtering is a data leak

Say a user asks for 20 results. You run the vector search, get 20 hits, drop the 7 the user isn't allowed to see, and return 13. You just leaked two things: the page is short (the user can infer there are hidden documents matching their query), and your total count and score distribution came from the unfiltered set.

To get a full page of 20 you have to over-fetch, say the top 200, then filter down. Now pick the next page. Offset 20 into which set, the filtered one or the raw one? If a grant changes between requests, your cursor points at the wrong row. Pagination over a post-filtered result set is quietly broken, and the bug only shows up once someone has enough documents to paginate.

And it's slow. The filter runs in your app, so every candidate crosses the wire before you throw most of it away. The fix is to move the authorization decision into retrieval, so the index never hands back rows the user can't see.

OpenFGA as the source of truth

Most teams shipping enterprise search already model authorization somewhere: roles, groups, folder inheritance, sharing. Increasingly that lives in OpenFGA, the CNCF, Zanzibar-style ReBAC engine that came out of Auth0/Okta. The last thing you want is to re-encode all of that in your search layer and keep the two copies in sync.

So don't. Mixpeek's external authorization mode treats your OpenFGA as the single source of truth and acts as a relying party: at retrieval time it asks OpenFGA what the acting user can see and filters the candidates server-side, before anything leaves the box. Your permission model stays in one place. Search just reads it.

The contract is one convention: each Mixpeek document is an OpenFGA object whose id equals the Mixpeek document_id. By default Mixpeek checks the relation viewer on object type document.

model
  schema 1.1

type user

type group
  relations
    define member: [user]

type document
  relations
    # grant to a user, to user:* (public), or to a group's members
    define viewer: [user, user:*, group#member]

The whole authorization model. Groups give you inheritance for free.

A grant is then just a tuple: document:doc_abc123 # viewer @ user:alice. Use user:* for public, or point viewer at group:eng#member and every member of the group inherits access. Add or remove a member and access follows, with no per-document re-grant. OpenFGA resolves the whole graph; Mixpeek only relays the yes/no.

The localhost trap

Here's the thing that will waste your afternoon if nobody tells you. Mixpeek's API is hosted at api.mixpeek.com. Your freshly-started OpenFGA is at http://localhost:8080. The hosted API cannot reach your laptop. Point the namespace's authorization config at a localhost URL and every query comes back empty, because the system is fail-closed: it couldn't reach OpenFGA, so it returns the safe (empty) set and says nothing is wrong.

The fix is a two-minute tunnel that gives OpenFGA a public https URL:

cloudflared tunnel --url http://localhost:8080
#  -> https://random-words.trycloudflare.com

Run OpenFGA with a preshared key so the tunnel isn't an open door, and store that same key as a Mixpeek organization secret. You reference the secret by name in the config; the token is never stored in the namespace. This single gotcha is the difference between a repo that works when forked and one that frustrates everyone, so the example calls it out up front.

Wiring it to Mixpeek

Two pieces. First, every end-user gets a user-scoped API key whose principal_id matches their OpenFGA subject. alice's key carries principal_id=alice, which maps to the subject user:alice. Her searches use her key, so Mixpeek knows who is asking.

Second, opt the namespace in by PATCHing its authorization block:

{
  "infrastructure": {
    "authorization": {
      "enabled": true,
      "provider": "openfga",
      "api_url": "https://<your-tunnel>.trycloudflare.com",
      "store_id": "01J0X...",
      "relation": "viewer",
      "object_type": "document",
      "mode": "pull_list_objects",
      "api_token_secret_ref": "openfga_token"
    }
  }
}

That's the whole integration surface. No query changes. Existing retrievers, saved or ad-hoc, start enforcing immediately.

alice and bob run the same query

The example seeds six images: two private to alice, two private to bob, one shared with the eng group (both are members), one public. After ingestion it writes one viewer tuple per document, then asks OpenFGA directly what each user can see:

$ curl ... /stores/$S/list-objects -d '{"type":"document","relation":"viewer","user":"user:alice"}'
{"objects":["document:doc_alice_1","document:doc_alice_2","document:doc_shared","document:doc_public"]}

$ curl ... /stores/$S/list-objects -d '{"type":"document","relation":"viewer","user":"user:bob"}'
{"objects":["document:doc_public","document:doc_bob_1","document:doc_bob_2","document:doc_shared"]}

$ curl ... /stores/$S/check -d '{"tuple_key":{"user":"user:bob","relation":"viewer","object":"document:doc_alice_1"}}'
{"allowed":false,"resolution":""}

Real output from the example's OpenFGA store.

alice resolves to her two private docs plus the shared and public ones. bob gets his two plus shared and public. bob asking for alice's private document is a flat no. Now run the actual search with each user's key:

Query: "a photo"  (same for both users, no client-side filtering)

  alice  sees 4 docs: ['doc_alice_1', 'doc_alice_2', 'doc_shared', 'doc_public']
  bob    sees 4 docs: ['doc_bob_1', 'doc_bob_2', 'doc_shared', 'doc_public']

  shared/public (both see): 2 docs
  alice-only (private):     2 docs
  bob-only (private):       2 docs

PASS - server-side, fail-closed authorization is working.

Same retriever, same query, two scoped result sets.

Same retriever, same query string, different results. The demo script does zero filtering. Every drop happened inside Mixpeek, against the live OpenFGA decision. The client never sees a document it isn't allowed to, so there's nothing to leak and nothing to paginate incorrectly.

What changes at production scale

The demo uses pull_list_objects: ask OpenFGA for the user's accessible set, pre-filter the search to those ids. That's exact and cheap when each user can see at most a thousand-ish documents. Past that, listing every accessible object stops being free, so you switch strategies:

auto (default) uses ListObjects when the accessible set is small, otherwise falls back to post-filtering. Start here.
pull_batch_check runs the search first, then BatchChecks the candidates and drops the unauthorized ones. Scales to users who can see a lot.
push subscribes to your OpenFGA changelog and projects grants into an indexed field, filtering in-index for the lowest latency. The trade is eventual consistency.

Post-filtering means a page could come back short, so Mixpeek over-fetches by over_fetch_factor (2x by default) to keep pages full after drops. And it stays fail-closed throughout: if OpenFGA is unreachable, you get fewer results, never unauthorized ones.

One real wrinkle worth flagging: a document's OpenFGA object id has to equal its Mixpeek document_id, which only exists after ingestion. So you can't pre-write those tuples. The example writes them once the ingestion batch completes, mapping each returned id to its grant. In a real system you'd write and revoke these as documents are created and shared, the same place you already update OpenFGA today.

The general pattern

Permission-aware retrieval is the thing blocking a lot of AI search from shipping to enterprise, and post-filtering in app code is the wrong answer for the same reason it's wrong for SQL: the engine that fetches the data should enforce who can see it. Keeping authorization in OpenFGA and making the search layer a relying party means one source of truth, enforced where the rows actually live.

The repo runs the whole thing on a free Mixpeek key and a local OpenFGA. It uses SigLIP for the image embeddings, but the authorization layer is identical for text, video, or audio. If you're building multimodal search for anyone who cares who sees what, start with the check in the right place.