NEWVectors or files. Pick a path.Start →
    Retrieval
    19 min read
    Updated 2026-06-19

    Index Freshness and Incremental Updates: How Just-Ingested Content Becomes Searchable

    When an agent ingests a video, document, or audio clip, can it retrieve that content one second later? This guide explains the mechanics of index freshness for unstructured search: LSM-style segment architecture, HNSW incremental inserts, tombstone deletes, background compaction, and the freshness-versus-recall tradeoffs that decide whether an agent can see what it just stored.

    Index Freshness
    Incremental Indexing
    HNSW
    Tombstones
    Compaction
    Vector Search
    Agent Memory

    Why Freshness Is an Agent Problem



    An AI agent that ingests unstructured content has a strict expectation that human search products usually do not: it wants to retrieve what it just stored. A perception agent transcribes a meeting, then immediately asks "what did the CFO say about Q3 guidance." A research agent ingests a PDF, then queries it in the same reasoning chain. A monitoring agent indexes a new camera clip, then checks whether a similar event happened in the last minute.

    In all of these cases the content was created seconds ago. If the search index has not absorbed it yet, the agent gets a wrong answer that looks confident. It does not see an error. It sees an empty result set or stale neighbors, and it reasons forward from incomplete evidence. Index freshness is the property that decides whether the agent can see, hear, and search what it just produced.

    Freshness is not free. The data structures that make approximate nearest neighbor search fast (graphs, inverted lists, quantization codebooks) are expensive to mutate. The whole engineering problem is reconciling two opposing forces:

  1. Read efficiency wants a large, well-optimized, immutable index.
  2. Write freshness wants every new vector visible immediately, with no rebuild.


  3. This guide explains how production systems resolve that tension, and what each design choice costs an agent.

    Defining the Freshness Metrics



    Before tuning anything, name the quantities you are trading.

  4. Indexing latency (visibility lag): the time from "content accepted" to "content returned by a query that should match it." This is the number agents care about most.
  5. Ingest throughput: how many items per second the system can absorb without falling behind.
  6. Query recall: the fraction of true neighbors returned. Freshness tricks often degrade recall on the most recent data first.
  7. Write amplification: how many times a single ingested vector gets physically rewritten before it reaches its final resting structure. High write amplification burns CPU and IO and, for GPU-extracted embeddings, can quietly re-pay extraction cost if a pipeline re-derives features during a rebuild.


  8. A system that claims "real-time indexing" has made a specific choice on each of these. There is no design that maximizes all four at once.

    The Core Pattern: LSM-Style Segments



    The dominant architecture for fresh vector search borrows from log-structured merge trees, the same idea behind RocksDB, Cassandra, and Lucene.

    The index is not one monolithic structure. It is a set of segments:

    1. Writable segment (memtable). A small, in-memory structure that accepts new vectors with cheap inserts. New content lands here first and becomes queryable almost immediately. 2. Sealed segments. When the writable segment reaches a size or age threshold, it is sealed (made immutable) and a new writable segment opens. Sealed segments are optimized for read performance. 3. Large base segments. Background processes merge many sealed segments into fewer large ones, rebuilding the ANN structure for better recall and lower per-query overhead.

    A query fans out across every segment, gathers top-k candidates from each, and merges the results:

    query(q, k):
        candidates = []
        for segment in all_segments:        # writable + sealed + base
            candidates += segment.search(q, k)
        return top_k(merge(candidates), k)
    


    This is why freshness is achievable at all. The writable segment is tiny, so even a brute-force or lightly-indexed scan over it is fast, and it makes brand-new content visible without touching the large optimized segments. The large segments carry the bulk of the corpus and are rebuilt rarely.

    The cost is query fan-out: more segments means more sub-searches to merge. A system that never compacts ends up with thousands of tiny segments and slow queries. A system that compacts too aggressively spends all its CPU rebuilding. Compaction policy is the dial between freshness and steady-state query cost.

    Incremental Inserts in HNSW



    The graph-based index HNSW (Hierarchical Navigable Small World) is naturally insert-friendly, which is why it dominates fresh-search workloads. Inserting a vector does not require a rebuild:

    1. Assign the new node a random maximum layer (drawn from an exponential distribution, so most nodes live only on the bottom layer). 2. Greedily descend from the top entry point to find the nearest neighbors at each layer. 3. At each layer up to the node's max, connect it to its `M` closest neighbors and add back-links. 4. Prune over-full neighbor lists using the heuristic that keeps diverse, navigable connections rather than just the closest ones.

    The cost is roughly \(O(M \cdot \log n)\) per insert, which is cheap enough for streaming. But two slow problems accumulate:

  9. Entry-point drift. Early inserts shape the upper-layer graph. As the distribution of ingested content shifts (a new camera angle, a new document language, a new product category), the upper layers can become poorly representative, hurting recall on recent data. This is one reason periodic full rebuilds still matter even with incremental inserts.
  10. Graph degradation under churn. Heavy insert-and-delete cycles fragment the neighbor lists and leave dangling or suboptimal links. Navigability degrades silently. The fix is background re-optimization, not a runtime flag.


  11. The Delete Problem and Tombstones



    Deletes are far harder than inserts in graph indexes. Removing a node tears holes in the navigation graph: its neighbors lose a hop they relied on, and the greedy search can get stranded. Physically repairing the graph on every delete is too expensive for high-churn workloads.

    The near-universal answer is the tombstone, a soft delete:

    1. Mark the vector as deleted with a flag in its payload or a deleted-id bitmap. Leave it physically in the graph. 2. At query time, retrieve candidates from the ANN structure as usual, then filter out any tombstoned ids before returning results to the agent. 3. During background compaction, physically drop tombstoned vectors when the segment is rebuilt, reclaiming memory and removing them from the graph for good.

    Tombstones make deletes \(O(1)\) and keep the graph intact, but they create two non-intuitive costs an agent operator must understand:

  12. Over-retrieval. If a segment is 40% tombstoned, a query for top-10 must fetch far more than 10 raw candidates to survive filtering. Systems compensate by searching with a larger `ef` or a wider candidate pool, which raises latency. A "deleted" document that you cannot see in results is still costing you query work until compaction runs.
  13. Stale memory and recall drift. Tombstoned vectors still occupy RAM and still participate in graph navigation, so a heavily-churned index can be large and slow even though its logical size is small. This is exactly the kind of surprise that looks like "search got slow for no reason" until you check the tombstone ratio.


  14. Background Compaction and Streaming Merge



    Compaction is the janitor that pays down the debt that inserts and tombstones accumulate. It runs off the query path and does three jobs:

    1. Merge small segments into larger ones to cut query fan-out. 2. Purge tombstones by physically rebuilding without the deleted vectors. 3. Re-optimize the graph so recent inserts get clean, navigable links.

    For disk-resident indexes the canonical design is a streaming merge (popularized by FreshDiskANN): new vectors go into a small in-memory delta graph for instant visibility, deletes are recorded as tombstones, and a background process periodically folds the delta and the tombstones into the large on-disk graph. The agent always queries the union of the on-disk graph and the in-memory delta, so it sees fresh content immediately while the expensive merge happens asynchronously. In-place update schemes such as SPFresh push this further by patching the existing structure rather than rebuilding whole partitions, trading implementation complexity for lower write amplification at billion scale.

    The operational lesson: compaction is not a tuning detail you can ignore. If it falls behind, freshness, recall, latency, and memory all degrade together. Monitor the segment count and tombstone ratio the way you monitor disk space.

    The Sparse and Multimodal Wrinkle



    Freshness is not just a dense-vector concern. Agents over unstructured content usually run hybrid retrieval, and each index type has its own freshness story:

  15. Lexical (BM25) indexes are inverted lists keyed by term. Adding a document updates posting lists and global statistics (document frequency, average document length). A common production failure is a lexical index that does not get rebuilt on a snapshot recovery, leaving a "lexical: true" retriever that silently returns zero documents because its posting lists were never restored. New content existing in the dense index but missing from the sparse index produces hybrid results that are subtly wrong.
  16. Payload and filter indexes must be updated transactionally with the vector, or a filter like "ingested in the last hour" will exclude content that is technically present in the vector index but missing from the filter index.
  17. Multimodal segments make this worse: a single ingested video produces transcript vectors, frame vectors, OCR spans, and object detections, often written to different index structures. Freshness for that item is the slowest of its constituent indexes. The agent does not perceive the video as searchable until every modality it might query has landed.


  18. The takeaway for agent perception: define freshness per query shape, not per item. "The video is indexed" is meaningless if the agent's next question hits a modality that has not finished.

    Freshness Strategies and Their Tradeoffs



    StrategyVisibility lagRecall on new dataCost driver
    Full periodic rebuildMinutes to hoursExcellent after rebuildWasted recompute, high write amplification
    Writable memtable + sealed segmentsSecondsGood, slightly lower on newestQuery fan-out, compaction CPU
    HNSW incremental insertSub-secondGood, drifts under distribution shiftGraph degradation, periodic re-optimize
    In-memory delta + streaming mergeSub-secondGoodBackground merge IO, memory for delta
    Tombstone-only deletesN/A (deletes)Degrades with churnOver-retrieval, stale memory
    There is no universally correct row. A batch analytics corpus that updates nightly should prefer a periodic rebuild for maximum recall and minimum operational surface. An agent writing to its own memory in a tight loop needs sub-second visibility and must accept compaction overhead and slightly noisier recall on the freshest vectors.

    How This Applies to Mixpeek



    When an agent ingests content through Mixpeek, the object flows through extraction (embeddings, transcripts, OCR, detections) and into the underlying vector store (MVS). The freshness contract is what determines whether a retriever can immediately find the new object across every modality it was decomposed into.

    from mixpeek import Mixpeek

    client = Mixpeek(api_key="YOUR_KEY")

    # 1. An agent ingests a new clip. Extraction produces multiple # feature types (visual, transcript, OCR) into the collection. obj = client.ingest( namespace="agent-memory", bucket_id="session-clips", blobs=[{"type": "video", "url": "s3://bucket/clip-2719.mp4"}], )

    # 2. Before querying, confirm the object reached an indexed state. # Treat freshness as a query-shape property, not a single boolean: # poll status rather than assuming the next read will see it. status = client.objects.get(namespace="agent-memory", object_id=obj["object_id"]) # status reflects extraction + indexing progress per feature type

    # 3. Once indexed, the retriever sees the new content alongside the # rest of the corpus. The fan-out across writable and base segments # is handled by the store, not the agent. results = client.retrievers.execute( namespace="agent-memory", retriever_id="hybrid_search", inputs={"query": "what did the CFO say about Q3 guidance"}, filters={"AND": [ {"field": "created_at", "operator": "gte", "value": "2026-06-19T00:00:00Z"} ]}, )


    The agent-relevant design rules that fall out of everything above:

  19. Do not assume read-after-write. An ingest call returning success means accepted, not searchable. Poll object status or design the agent loop to tolerate brief visibility lag, especially for multimodal items where the slowest modality gates freshness.
  20. Watch the tombstone and segment health, not just logical counts. A collection with heavy churn can be slow and memory-hungry even when its logical size is small. Compaction lag is the usual culprit.
  21. Keep dense, sparse, and payload indexes in sync. Hybrid retrievers are only as fresh as their least-fresh sub-index. A restored snapshot that skips the lexical rebuild will return confidently wrong hybrid results.


  22. Key Takeaways



    1. Freshness is whether an agent can retrieve what it just ingested, and it is the metric that most directly governs whether the agent reasons over complete evidence.

    2. The standard solution is an LSM-style segment architecture: a tiny writable segment for instant visibility, sealed segments for reads, and large base segments rebuilt by background compaction.

    3. HNSW makes inserts cheap but suffers entry-point drift and graph degradation under churn, so periodic re-optimization still matters.

    4. Deletes use tombstones for \(O(1)\) soft removal, at the cost of over-retrieval and stale memory until compaction physically purges them.

    5. Compaction and streaming merge are the load-bearing background work. If they fall behind, freshness, recall, latency, and memory all degrade together.

    6. Freshness is per query shape, not per item. A multimodal object is only searchable once every modality the agent might query has been indexed, and hybrid retrieval is only as fresh as its least-fresh sub-index.

    Further Reading



  23. Approximate Nearest Neighbor Search: The Algorithms Behind Fast Vector Retrieval
  24. Embedding Quantization & Compression
  25. Adaptive Indexing for Agentic Search
  26. Embedding Portability and Versioning
  27. Managed Mixpeek

    Put multimodal search to work

    Connect a bucket and Mixpeek runs the whole multimodal search pipeline for you: extraction, indexing, and search over your own objects. No models to wire up, nothing to host.

    Start with Managed
    MVS · bring your own

    Already have vectors?

    Keep your embeddings on your own cloud and run dense, sparse, and BM25 search directly on object storage. First 1M vectors free.

    Start with MVS

    Build a Multimodal Search Pipeline

    Give agents searchable access to video, image, audio, and document evidence with Mixpeek.

    Start BuildingRead Docs