Embedding versioning addresses the operational problem of migrating from one embedding model to another. Because vectors from different model versions occupy incompatible spaces, upgrading requires re-encoding all stored data, which is expensive, slow, and risky. Without a versioning strategy, organizations either stay on outdated models or face painful bulk migrations.

How It Works

When a new embedding model is released (or an existing model is fine-tuned), the vector space changes. Queries encoded with the new model cannot be meaningfully compared against documents encoded with the old model. Embedding versioning solves this by maintaining parallel indexes, migrating data incrementally, and routing queries to the correct index based on which model version produced the stored vectors. The simplest approach is dual-write: new data goes into both the old and new index, while a background job re-encodes historical data into the new space. Once migration is complete, the old index is retired.

Technical Details

Three main strategies exist. Shadow indexing creates a second index alongside the primary one, encodes all new data with both models, and backfills historical data in the background. Query routing sends searches to both indexes and merges results using reciprocal rank fusion or score normalization. Once the new index reaches full coverage, traffic shifts entirely. Blue-green migration builds the new index completely offline, validates retrieval quality against a test set, and performs an atomic cutover. Progressive rollout re-encodes data in priority order (most queried documents first) and gradually increases the share of traffic served by the new index. Each strategy trades off between migration speed, compute cost, and retrieval continuity.

Best Practices

Retain raw source data for every embedded document. Re-encoding from originals is the only reliable way to populate a new index.
Automate quality benchmarks: before cutting over, compare recall@k and MRR on a golden test set between the old and new model.
Use namespace or collection-level versioning so that each model version maps to a distinct, isolated index.
Build backfill pipelines that are idempotent and resumable. Large-scale re-encoding jobs will fail partway through.
Monitor retrieval quality continuously during migration, not just before and after.

Common Pitfalls

Mixing vectors from different model versions in the same index. This silently degrades retrieval quality because the coordinate spaces are incompatible.
Postponing migration until the old model is deprecated, leaving no time for gradual rollout or quality validation.
Underestimating re-encoding cost. A 100M document corpus at $0.0001 per embedding still costs $10,000 and days of compute.
Failing to version the query encoder alongside the document encoder. Both sides must use the same model version.

Put it to work: search your own files, free

Managed Mixpeek

Put multimodal search to work

Connect a bucket and Mixpeek runs the whole multimodal search pipeline for you: extraction, indexing, and search over your own objects. No models to wire up, nothing to host.

Start with Managed

MVS · bring your own

Already have vectors?

Keep your embeddings on your own cloud and run dense, sparse, and BM25 search directly on object storage. From $25/mo.

Start with MVS

Building an agent? Connect Mixpeek over MCP

Related Terms

ACID API Blob Storage CLIP Embedding