Why Payload Size Is an Agent Problem
Vector search usually gets discussed as a ranking problem: embed the query, find nearest neighbors, rerank, return results. That is only half of the system an agent experiences.
An agent does not consume a vector score. It consumes the payload attached to each result. That payload might include transcript text, OCR spans, image URLs, keyframe thumbnails, bounding boxes, model confidence, speaker labels, tenant metadata, access policy, and source lineage.
If the retriever returns too little, the agent cannot answer or cite evidence. If it returns too much, the agent burns context, leaks irrelevant fields into the prompt, increases latency, and makes tool output harder to reason over.
Payload projection is the retrieval-layer control that decides which fields come back for a query.
This matters most for unstructured content because each result can be a dense evidence object:
For agents, retrieval is not only "top 10 nearest vectors." It is "top 10 evidence packets that fit the task."
What Payload Projection Means
Payload projection is field selection for retrieval results. The query still ranks over the indexed representation, but the response returns only the fields the caller asks for.
The basic shape:
{
"query": "customer asked for a refund after an outage",
"top_k": 10,
"select_fields": [
"source_uri",
"text",
"speaker",
"start_ms",
"end_ms"
]
}
The ranking engine can still use vectors, sparse terms, filters, and metadata. Projection controls the output payload.
A useful way to separate the concerns:
| Concern | Question it answers | Example |
| Ranking | Which items are most relevant? | Vector score, BM25 score, reranker score |
| Filtering | Which items are allowed? | customer_id, date, content_type, policy label |
| Projection | Which fields should come back? | source_uri, span text, timestamp, thumbnail |
| Expansion | What should be fetched after selection? | full transcript window, full image, full PDF page |
Projection vs. Filtering vs. Reranking
These operations are often confused because they all appear near the query.
Filtering changes the candidate set. If an agent asks for "refund calls from enterprise customers last week," filters should restrict the search to enterprise accounts and the date window before ranking.
Reranking changes the order. A first-stage vector search may retrieve 200 candidate spans, then a cross-encoder or late-interaction model rescoring the top candidates can improve precision.
Projection changes the returned fields. The retriever may rank over hidden internal fields and still return only a compact evidence envelope.
For example, a video search system may rank over:
But the agent may only need:
That output is smaller, cleaner, and easier to cite.
The Evidence Envelope Pattern
Agents work best when retrievers return structured evidence envelopes instead of raw database rows.
{
"id": "call_481:822180:826920",
"score": 0.84,
"evidence": {
"text": "I would like a refund because the outage affected our launch",
"source_uri": "s3://support-calls/2026/06/09/call_481.wav",
"start_ms": 822180,
"end_ms": 826920,
"speaker": "customer"
},
"expand": {
"clip_uri": "mixpeek://clips/call_481/822180-826920",
"nearby_context_uri": "mixpeek://spans/call_481/819000-830000"
}
}
The envelope separates immediate evidence from expansion handles.
Immediate evidence is what the agent needs to answer now. Expansion handles let the agent fetch more if needed. This is the same idea behind good tool design: return enough structured output to act, but do not dump the entire object graph into the model context.
The envelope should usually contain five field classes.
| Field class | Purpose | Examples |
| Identity | Let the system deduplicate and trace results | id, namespace, object_id, span_id |
| Citation | Let a human verify the answer | source_uri, page, start_ms, end_ms, keyframe_url |
| Evidence | Let the model answer | text, caption, ocr_excerpt, object_label |
| Confidence | Let the model handle uncertainty | score, model_confidence, speaker_overlap |
| Expansion | Let the agent fetch more | clip_uri, page_image_uri, full_payload_uri |
Field Classes for Multimodal Retrieval
A practical schema separates fields by how often they should appear in agent responses.
1. Rank Fields
Rank fields are used by the retriever but usually not returned.
Examples:
These fields can be large and meaningless to an LLM. They should stay inside the retrieval engine unless the caller is debugging.
2. Cite Fields
Cite fields are small and almost always useful.
Examples:
For agents that search media, cite fields are not optional. They turn a generated answer into inspectable evidence.
3. Answer Fields
Answer fields are compact natural-language fields the model can reason over.
Examples:
These fields should be clean text. Avoid embedding JSON blobs, timestamps, and unrelated metadata in the text sent to the model.
4. Governance Fields
Governance fields tell the agent whether it is allowed to use or reveal the evidence.
Examples:
Some governance fields should be used for filtering but not returned to the model. Others should be returned so the tool caller can enforce policy outside the model.
5. Expansion Fields
Expansion fields are handles, not full data dumps.
Examples:
The first retrieval call should return handles. A second tool call can fetch the larger payload only when the agent needs it.
Late Materialization
Late materialization is the database pattern behind efficient projection.
In a naive system, the search engine loads full payloads for every candidate, ranks them, and then returns a subset. For unstructured data, those payloads can be large: thumbnails, transcripts, OCR blocks, JSON metadata, and nested feature objects.
Late materialization delays full payload fetch until after ranking.
query
-> search index returns candidate IDs and scores
-> reranker narrows candidates
-> projection fetches selected fields only
-> response returns compact evidence envelopes
This has three benefits:
1. Less data moves across the retrieval path. 2. Large fields are fetched only when they are actually needed. 3. Agent context receives a predictable payload shape.
Late materialization is especially valuable when the vector index sits near object storage. You can keep large source objects and payload blobs in cheap storage, while the hot query path returns only the fields required by the current tool call.
Context Budget Math
Projection is often a bigger context win than another prompt rewrite.
Assume a support-call search returns 20 transcript spans. Each full payload has:
That is about 450 tokens per result, or 9,000 tokens for 20 results.
If the agent only needs text, speaker, timestamp, source URI, and score, each result might be 90 tokens. The same 20 results become about 1,800 tokens.
That difference changes the retrieval plan:
Context engineering is not only prompt design. It starts at the retrieval payload.
Query-Time Projection for Agents
Agents should request fields based on the task.
A question-answering task needs compact answer fields:
{
"select_fields": [
"source_uri",
"text",
"start_ms",
"end_ms",
"speaker",
"score"
]
}
A visual inspection task needs media handles:
{
"select_fields": [
"source_uri",
"keyframe_url",
"caption",
"objects",
"timestamp_ms",
"score"
]
}
A compliance task needs governance and provenance:
{
"select_fields": [
"source_uri",
"policy_label",
"evidence_text",
"model_id",
"extractor_version",
"confidence",
"review_uri"
]
}
A debugging task may need internal fields:
{
"select_fields": [
"id",
"score",
"vector_score",
"bm25_score",
"rerank_score",
"payload_size_bytes"
]
}
The agent should not use one universal payload shape for every query. Different tools can expose different safe projections.
Tool Design Pattern
A retrieval tool can make projection explicit in its schema.
{
"name": "search_media_evidence",
"description": "Search indexed video, audio, image, and document evidence. Returns compact citeable spans and expansion handles.",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"},
"content_type": {"type": "string", "enum": ["video", "audio", "image", "document", "any"]},
"top_k": {"type": "integer", "minimum": 1, "maximum": 50},
"projection": {
"type": "string",
"enum": ["answer", "visual", "compliance", "debug"]
}
},
"required": ["query"]
}
}
The tool can map projection presets to field lists.
| Preset | Fields |
| answer | text, source_uri, start_ms, end_ms, score |
| visual | keyframe_url, caption, objects, timestamp_ms, score |
| compliance | policy_label, evidence_text, model_id, confidence, review_uri |
| debug | score components, payload size, index partition, model version |
Failure Modes
Returning full payloads by default. This makes prototyping easy and production agents noisy. Default to compact evidence.
Dropping citation fields. If the projected result omits source URI, timestamp, page, or bounding box, the agent cannot produce verifiable answers.
Embedding metadata into answer text. Text like "[00:13:42] speaker=customer policy=internal" pollutes embeddings and model context. Keep clean answer text and structured metadata separate.
Using projection as authorization. Projection can hide fields from a response, but it is not access control. Authorization must happen before retrieval and before expansion.
Returning fields the agent cannot interpret. Raw vectors, model logits, and large nested feature blobs are useful for debugging but poor answer context.
No expansion path. If the first result is compact but there is no way to fetch the source clip, page image, or full transcript, the agent gets stuck.
One projection for every tool. Search, compliance review, visual QA, and debugging need different payload shapes.
Evaluation
Evaluate projection separately from ranking.
Ranking asks: did the retriever find the right evidence?
Projection asks: did the response include the right fields, and only the right fields, for the task?
Useful metrics:
| Metric | What it measures |
| Citation completeness | Percentage of results with source handle and time/page/box when needed |
| Payload bytes per result | Network and serialization cost |
| Prompt tokens per result | Context budget cost |
| Expansion rate | How often agents need a second fetch |
| Answer success with projection | Whether compact fields still let the model answer correctly |
| Leakage rate | Whether irrelevant or policy-sensitive fields are returned |
| Debug sufficiency | Whether debugging projections expose enough scoring information |
1. Ask the agent a question requiring media evidence. 2. Require citations in the final answer. 3. Check that every cited answer maps to a returned source handle. 4. Check that compact projection succeeds without full payloads. 5. Run the same task with larger projections and compare answer quality, latency, and token use.
The goal is not the smallest possible payload. The goal is the smallest payload that lets the agent answer and cite correctly.
Mixpeek MVS Example
In MVS, store clean searchable fields and structured citation metadata.
from mixpeek import Mixpeek
mx = Mixpeek(api_key="YOUR_API_KEY")
mx.mvs.upsert(
namespace="support-call-memory",
vectors=[
{
"id": "call_481:822180:826920:bge_m3",
"values": span_embedding,
"metadata": {
"source_uri": "s3://support-calls/2026/06/09/call_481.wav",
"text": "I would like a refund because the outage affected our launch",
"speaker": "customer",
"start_ms": 822180,
"end_ms": 826920,
"language": "en-US",
"asr_model": "nvidia/nemotron-3.5-asr-streaming-0.6b",
"aligner_model": "Qwen/Qwen3-ForcedAligner-0.6B",
"clip_uri": "mixpeek://clips/call_481/822180-826920"
}
}
]
)
Then query with a compact projection for the agent answer:
results = mx.mvs.search_dense(
namespace="support-call-memory",
vector=query_embedding,
top_k=20,
filter={
"language": {"$eq": "en-US"}
},
select_fields=[
"source_uri",
"text",
"speaker",
"start_ms",
"end_ms",
"clip_uri"
]
)
For a visual evidence namespace, project only the fields the visual agent needs:
results = mx.mvs.search_dense(
namespace="video-scene-memory",
vector=query_embedding,
top_k=10,
select_fields=[
"source_uri",
"keyframe_url",
"caption",
"objects",
"start_ms",
"end_ms"
]
)
The first call answers "what evidence should the model read?" The expansion handle answers "where can the system fetch more if needed?"
Design Checklist
Key Takeaways
1. Payload projection is the retrieval contract between a vector store and an agent.
2. Ranking decides which results matter. Projection decides what evidence the agent sees.
3. Cite fields are mandatory for media agents. Without source handles, timestamps, pages, or boxes, answers cannot be verified.
4. Late materialization keeps large unstructured payloads out of the hot path until the agent actually needs them.
5. Good projection reduces prompt tokens, network cost, and irrelevant context without reducing answer quality.
6. The best default is a compact evidence envelope with expansion handles.