Why Freshness Is an Agent Problem
An AI agent that ingests unstructured content has a strict expectation that human search products usually do not: it wants to retrieve what it just stored. A perception agent transcribes a meeting, then immediately asks "what did the CFO say about Q3 guidance." A research agent ingests a PDF, then queries it in the same reasoning chain. A monitoring agent indexes a new camera clip, then checks whether a similar event happened in the last minute.
In all of these cases the content was created seconds ago. If the search index has not absorbed it yet, the agent gets a wrong answer that looks confident. It does not see an error. It sees an empty result set or stale neighbors, and it reasons forward from incomplete evidence. Index freshness is the property that decides whether the agent can see, hear, and search what it just produced.
Freshness is not free. The data structures that make approximate nearest neighbor search fast (graphs, inverted lists, quantization codebooks) are expensive to mutate. The whole engineering problem is reconciling two opposing forces:
This guide explains how production systems resolve that tension, and what each design choice costs an agent.
Defining the Freshness Metrics
Before tuning anything, name the quantities you are trading.
A system that claims "real-time indexing" has made a specific choice on each of these. There is no design that maximizes all four at once.
The Core Pattern: LSM-Style Segments
The dominant architecture for fresh vector search borrows from log-structured merge trees, the same idea behind RocksDB, Cassandra, and Lucene.
The index is not one monolithic structure. It is a set of segments:
1. Writable segment (memtable). A small, in-memory structure that accepts new vectors with cheap inserts. New content lands here first and becomes queryable almost immediately. 2. Sealed segments. When the writable segment reaches a size or age threshold, it is sealed (made immutable) and a new writable segment opens. Sealed segments are optimized for read performance. 3. Large base segments. Background processes merge many sealed segments into fewer large ones, rebuilding the ANN structure for better recall and lower per-query overhead.
A query fans out across every segment, gathers top-k candidates from each, and merges the results:
query(q, k):
candidates = []
for segment in all_segments: # writable + sealed + base
candidates += segment.search(q, k)
return top_k(merge(candidates), k)
This is why freshness is achievable at all. The writable segment is tiny, so even a brute-force or lightly-indexed scan over it is fast, and it makes brand-new content visible without touching the large optimized segments. The large segments carry the bulk of the corpus and are rebuilt rarely.
The cost is query fan-out: more segments means more sub-searches to merge. A system that never compacts ends up with thousands of tiny segments and slow queries. A system that compacts too aggressively spends all its CPU rebuilding. Compaction policy is the dial between freshness and steady-state query cost.
Incremental Inserts in HNSW
The graph-based index HNSW (Hierarchical Navigable Small World) is naturally insert-friendly, which is why it dominates fresh-search workloads. Inserting a vector does not require a rebuild:
1. Assign the new node a random maximum layer (drawn from an exponential distribution, so most nodes live only on the bottom layer). 2. Greedily descend from the top entry point to find the nearest neighbors at each layer. 3. At each layer up to the node's max, connect it to its `M` closest neighbors and add back-links. 4. Prune over-full neighbor lists using the heuristic that keeps diverse, navigable connections rather than just the closest ones.
The cost is roughly \(O(M \cdot \log n)\) per insert, which is cheap enough for streaming. But two slow problems accumulate:
The Delete Problem and Tombstones
Deletes are far harder than inserts in graph indexes. Removing a node tears holes in the navigation graph: its neighbors lose a hop they relied on, and the greedy search can get stranded. Physically repairing the graph on every delete is too expensive for high-churn workloads.
The near-universal answer is the tombstone, a soft delete:
1. Mark the vector as deleted with a flag in its payload or a deleted-id bitmap. Leave it physically in the graph. 2. At query time, retrieve candidates from the ANN structure as usual, then filter out any tombstoned ids before returning results to the agent. 3. During background compaction, physically drop tombstoned vectors when the segment is rebuilt, reclaiming memory and removing them from the graph for good.
Tombstones make deletes \(O(1)\) and keep the graph intact, but they create two non-intuitive costs an agent operator must understand:
Background Compaction and Streaming Merge
Compaction is the janitor that pays down the debt that inserts and tombstones accumulate. It runs off the query path and does three jobs:
1. Merge small segments into larger ones to cut query fan-out. 2. Purge tombstones by physically rebuilding without the deleted vectors. 3. Re-optimize the graph so recent inserts get clean, navigable links.
For disk-resident indexes the canonical design is a streaming merge (popularized by FreshDiskANN): new vectors go into a small in-memory delta graph for instant visibility, deletes are recorded as tombstones, and a background process periodically folds the delta and the tombstones into the large on-disk graph. The agent always queries the union of the on-disk graph and the in-memory delta, so it sees fresh content immediately while the expensive merge happens asynchronously. In-place update schemes such as SPFresh push this further by patching the existing structure rather than rebuilding whole partitions, trading implementation complexity for lower write amplification at billion scale.
The operational lesson: compaction is not a tuning detail you can ignore. If it falls behind, freshness, recall, latency, and memory all degrade together. Monitor the segment count and tombstone ratio the way you monitor disk space.
The Sparse and Multimodal Wrinkle
Freshness is not just a dense-vector concern. Agents over unstructured content usually run hybrid retrieval, and each index type has its own freshness story:
The takeaway for agent perception: define freshness per query shape, not per item. "The video is indexed" is meaningless if the agent's next question hits a modality that has not finished.
Freshness Strategies and Their Tradeoffs
| Strategy | Visibility lag | Recall on new data | Cost driver |
| Full periodic rebuild | Minutes to hours | Excellent after rebuild | Wasted recompute, high write amplification |
| Writable memtable + sealed segments | Seconds | Good, slightly lower on newest | Query fan-out, compaction CPU |
| HNSW incremental insert | Sub-second | Good, drifts under distribution shift | Graph degradation, periodic re-optimize |
| In-memory delta + streaming merge | Sub-second | Good | Background merge IO, memory for delta |
| Tombstone-only deletes | N/A (deletes) | Degrades with churn | Over-retrieval, stale memory |
How This Applies to Mixpeek
When an agent ingests content through Mixpeek, the object flows through extraction (embeddings, transcripts, OCR, detections) and into the underlying vector store (MVS). The freshness contract is what determines whether a retriever can immediately find the new object across every modality it was decomposed into.
from mixpeek import Mixpeek
client = Mixpeek(api_key="YOUR_KEY")
# 1. An agent ingests a new clip. Extraction produces multiple
# feature types (visual, transcript, OCR) into the collection.
obj = client.ingest(
namespace="agent-memory",
bucket_id="session-clips",
blobs=[{"type": "video", "url": "s3://bucket/clip-2719.mp4"}],
)
# 2. Before querying, confirm the object reached an indexed state.
# Treat freshness as a query-shape property, not a single boolean:
# poll status rather than assuming the next read will see it.
status = client.objects.get(namespace="agent-memory", object_id=obj["object_id"])
# status reflects extraction + indexing progress per feature type
# 3. Once indexed, the retriever sees the new content alongside the
# rest of the corpus. The fan-out across writable and base segments
# is handled by the store, not the agent.
results = client.retrievers.execute(
namespace="agent-memory",
retriever_id="hybrid_search",
inputs={"query": "what did the CFO say about Q3 guidance"},
filters={"AND": [
{"field": "created_at", "operator": "gte", "value": "2026-06-19T00:00:00Z"}
]},
)
The agent-relevant design rules that fall out of everything above:
Key Takeaways
1. Freshness is whether an agent can retrieve what it just ingested, and it is the metric that most directly governs whether the agent reasons over complete evidence.
2. The standard solution is an LSM-style segment architecture: a tiny writable segment for instant visibility, sealed segments for reads, and large base segments rebuilt by background compaction.
3. HNSW makes inserts cheap but suffers entry-point drift and graph degradation under churn, so periodic re-optimization still matters.
4. Deletes use tombstones for \(O(1)\) soft removal, at the cost of over-retrieval and stale memory until compaction physically purges them.
5. Compaction and streaming merge are the load-bearing background work. If they fall behind, freshness, recall, latency, and memory all degrade together.
6. Freshness is per query shape, not per item. A multimodal object is only searchable once every modality the agent might query has been indexed, and hybrid retrieval is only as fresh as its least-fresh sub-index.