Why Agent Search Needs Adaptive Indexing
Human search traffic is repetitive. Users type short queries, click results, reformulate, and leave behind stable query patterns.
Agent search traffic is less predictable. An agent may start with a broad semantic query, add filters after seeing partial results, issue a lexical query for an exact phrase, inspect citations, expand a time window, and then search again with a narrower budget.
That pattern matters for unstructured content:
One index rarely serves every query shape well. Dense vector search is useful for semantic recall, BM25 is useful for exact terms, payload indexes are useful for filters, and rerankers are useful for precision. Adaptive indexing is the process of watching real retrieval traffic, identifying the fields and query shapes that matter, and building the right indexes over time.
The goal is not to index everything. The goal is to make the hot paths fast, citeable, and cheap without making the storage layer impossible to operate.
The Three Index Families
Most retrieval systems for agent memory use three index families.
| Index family | What it answers | Typical fields |
| Vector index | What is semantically similar? | embeddings for text, images, audio, video scenes |
| Lexical index | What contains this exact phrase or token pattern? | transcript text, OCR text, titles, captions |
| Payload index | Which records match structured constraints? | tenant_id, object_type, created_at, speaker, camera_id, policy_label |
Vector indexes are good when the query is conceptual: "a customer sounds angry about a delayed shipment" or "a frame where someone opens a laptop."
Lexical indexes are good when the query contains exact evidence: a part number, an error string, a SKU, a quoted sentence, or a person's name.
Payload indexes are good when the agent must constrain search: only this customer, only last week, only videos, only English transcripts, only scenes with policy label "needs_review."
An agentic retrieval system needs all three because agents do not only ask semantic questions. They ask bounded, cited, tool-like questions.
Query Logs Are Training Data for the Storage Layer
Adaptive indexing begins with query logs. These logs should not just store the raw query string. They should describe the shape of the work the storage layer performed.
A useful retrieval log includes:
| Field | Why it matters |
| namespace | Reveals tenant and workload skew |
| query_type | Dense, sparse, hybrid, filter-only, rerank |
| filters | Shows which metadata fields are actually used |
| projected_fields | Shows what payloads agents ask for |
| top_k and candidate_k | Shows recall and rerank pressure |
| latency breakdown | Separates parse, filter, search, rerank, materialization |
| bytes returned | Shows projection and payload pressure |
| result count | Reveals over-selective filters and empty searches |
| index_hit | Shows whether a useful index was used |
| fallback_path | Shows when the engine had to scan or degrade |
Example query-shape record:
{
"namespace": "media-archive",
"query_type": "hybrid",
"filters": {
"object_type": "video",
"created_at": {"$gte": "2026-06-01"},
"policy_label": "approved"
},
"projected_fields": ["source_uri", "start_ms", "end_ms", "caption"],
"candidate_k": 200,
"top_k": 20,
"latency_ms": {
"filter": 42,
"vector": 81,
"bm25": 27,
"rerank": 133,
"materialize": 9
},
"bytes_returned": 18432,
"index_hit": ["object_type", "created_at"],
"fallback_path": null
}
Once logs look like this, index decisions can be grounded in evidence instead of guesswork.
Slow Query Diagnosis
Before building an index, identify which part of the query is slow.
The common latency components are:
| Component | Common cause |
| Parse and planning | Complex filters or many branches |
| Candidate generation | Large vector search, cold shard, high top_k |
| Filter evaluation | Unindexed fields, low-selectivity predicates, nested payload scans |
| Lexical search | Large posting lists, phrase queries, fuzzy matching |
| Reranking | Too many query-candidate pairs |
| Materialization | Loading large payloads after ranking |
| Projection | Returning too many fields or large nested payloads |
For agents, many slow queries come from combinations:
The fix may be a new index, but it may also be better routing, lower candidate_k, a projection preset, or a two-step agent tool.
When to Build a Payload Index
A payload index is worth building when a field is both common in filters and selective enough to reduce work.
Good payload index candidates:
Weak candidates:
A practical rule:
Build a payload index when a field appears in a meaningful share of slow queries and the filtered subset is much smaller than the namespace.
Example:
| Field | Query frequency | Selectivity | Decision |
| --- | ---: | ---: | --- |
|---|---|---|---|
| object_type | high | medium | Index |
| created_at | high | high for recent windows | Index |
| speaker | medium | medium | Index if transcript search is hot |
| random_trace_id | low | high | Do not index unless exact lookup is common |
| caption | high | low as a filter | Use lexical and vector search instead |
Filter-First vs. Vector-First Routing
Once indexes exist, the engine still has to choose how to use them.
There are two common plans.
Filter-first: apply payload filters first, then vector search inside the filtered subset.
Use this when:
Vector-first: retrieve semantic candidates first, then apply filters.
Use this when:
Hybrid plans combine both:
1. Use payload indexes to find allowed or likely partitions. 2. Run dense vector search in those partitions. 3. Run BM25 over exact text fields. 4. Merge candidates with reciprocal rank fusion or weighted scoring. 5. Rerank a bounded candidate set. 6. Project compact evidence fields.
The routing decision should be observable. When a query is slow, engineers need to see whether the planner chose filter-first, vector-first, lexical-first, or a fallback scan.
The Adaptive Loop
Adaptive indexing should be a controlled loop, not an automatic index explosion.
The loop:
1. Observe query logs and slow traces. 2. Group slow queries by shape, not only by text. 3. Estimate index benefit using frequency, selectivity, and latency savings. 4. Propose an index with a specific field, type, and namespace. 5. Build the index in the background. 6. Warm or validate it with representative queries. 7. Route a small share of traffic through it. 8. Compare latency, recall, empty-result rate, and cost. 9. Promote, keep warming, or retire it.
The key is step 8. An index is not successful because it built. It is successful because real queries became faster or more reliable without hurting recall or cost.
Agent Tool Design
Agents should not be asked to know index internals. They should express intent through tool parameters.
Example tool shape:
{
"name": "search_media_evidence",
"description": "Search video, audio, image, and document evidence with filters and compact citations.",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"},
"content_type": {"type": "string", "enum": ["video", "audio", "image", "document", "any"]},
"time_window": {"type": "string"},
"policy_label": {"type": "string"},
"top_k": {"type": "integer", "minimum": 1, "maximum": 50},
"projection": {"type": "string", "enum": ["answer", "visual", "compliance", "debug"]},
"latency_budget_ms": {"type": "integer"}
},
"required": ["query"]
}
}
The planner can translate this into indexes and routing:
The agent sees a stable tool. The retrieval system evolves underneath it.
Failure Modes
Indexing every field. This creates write amplification and operational noise. Index hot fields, not all fields.
Ignoring low-result queries. Empty results may mean the filter is too selective, the wrong modality was searched, or the planner applied filters too early.
Treating projection as indexing. Projection reduces returned payload size. It does not make filtering faster unless the filter field is indexed.
Applying rerankers too broadly. Cross-encoder reranking is powerful, but reranking 2,000 candidates often hides a bad first-stage plan.
Missing tenant skew. A field may be hot for one tenant and irrelevant for another. Shared averages hide expensive outliers.
No retirement policy. Old indexes consume space and slow writes. If query logs show no benefit, retire them.
No authorization boundary. Payload indexes can accelerate filtering, but access control must still be enforced before results are returned.
Evaluation
Evaluate adaptive indexing at the query-shape level.
| Metric | What it tells you |
| p50 and p95 latency by query shape | Whether hot agent paths are improving |
| index hit rate | Whether queries use intended indexes |
| filter selectivity | Whether payload indexes reduce candidate work |
| candidate recall | Whether routing keeps enough relevant evidence |
| empty-result rate | Whether filters or routing are too strict |
| rerank candidate count | Whether first-stage search is bounded |
| bytes returned | Whether projection is controlling payload size |
| citation success | Whether agents can verify answers |
| build cost and storage overhead | Whether the index is worth keeping |
Mixpeek MVS Example
Suppose a support video agent searches call recordings. Each transcript span is stored as a vector with structured payload fields.
from mixpeek import Mixpeek
mx = Mixpeek(api_key="YOUR_API_KEY")
mx.mvs.upsert(
namespace="support-video",
vectors=[
{
"id": "call_481:822180:826920:bge_m3",
"values": span_embedding,
"metadata": {
"source_uri": "s3://support-video/2026/06/09/call_481.mp4",
"object_type": "video",
"speaker": "customer",
"language": "en-US",
"policy_label": "approved",
"start_ms": 822180,
"end_ms": 826920,
"text": "I want a refund because the outage affected our launch"
}
}
]
)
After query logs show repeated filters on object_type, policy_label, and time ranges, build payload indexes for those fields.
mx.mvs.create_payload_index(
namespace="support-video",
field="object_type",
field_type="keyword"
)
mx.mvs.create_payload_index(
namespace="support-video",
field="policy_label",
field_type="keyword"
)
mx.mvs.create_payload_index(
namespace="support-video",
field="start_ms",
field_type="integer"
)
Then search with filters and compact projection.
results = mx.mvs.search_dense(
namespace="support-video",
vector=query_embedding,
top_k=20,
filter={
"object_type": {"$eq": "video"},
"policy_label": {"$eq": "approved"},
"start_ms": {"$gte": 600000}
},
select_fields=[
"source_uri",
"speaker",
"text",
"start_ms",
"end_ms"
]
)
The agent receives compact cited evidence. The storage layer uses payload indexes to avoid scanning unrelated records. If logs later show that speaker is a frequent slow filter, add it. If it is rarely used, leave it unindexed.
Managed Mixpeek Example
Managed Mixpeek is the right path when the system should extract the features before indexing.
For video, that means the pipeline can produce:
Adaptive indexing still applies, but the fields come from the extraction pipeline instead of an upstream application.
from mixpeek import Mixpeek
mx = Mixpeek(api_key="YOUR_API_KEY")
collection = mx.collections.create(
namespace="training-video",
collection_id="field-training",
extractors=[
{"extractor_type": "transcription"},
{"extractor_type": "video_describer"},
{"extractor_type": "ocr"}
]
)
mx.buckets.upload(
namespace="training-video",
bucket_id="raw-training",
file_path="forklift-safety.mp4"
)
Use MVS when you already have embeddings and metadata. Use Managed when you want the extraction, indexing, and retrieval system together.
Design Checklist
Key Takeaways
1. Adaptive indexing is a retrieval feedback loop, not a one-time schema decision.
2. Query logs should capture the work the storage layer performed, not just the text the user typed.
3. Vector, lexical, and payload indexes answer different questions. Agent search needs all three.
4. Payload indexes are most valuable when filters are frequent, selective, and repeated.
5. Routing matters as much as indexing. Filter-first and vector-first plans serve different query shapes.
6. The best outcome is not the most indexes. It is faster, cheaper, citeable evidence for the agent.