Changing embedding models doesn't have to break your index

A vector index encodes every document into a point in a space defined by the embedding model that created it. That's the whole problem. Text search indexes don't have this property — you can swap a tokenizer and rebuild keyword statistics overnight. Vector spaces aren't portable across models. The geometry changes. Distances mean different things. A cosine similarity computed between a document embedded with CLIP and a query embedded with SigLIP is noise.

So when a better model ships — and one always does — you're stuck. Every document in your index needs to be re-encoded. While that's happening, queries mix old-model documents with new-model queries and recall drops. When you're done, you have no way to compare quality against the old system before you cut over. If it's worse, you start over.

The teams that handle this cleanly treat model versions like code versions. You don't migrate code by deleting v1 and overwriting it with v2. You deploy v2 alongside v1, compare, and then cut over. Same principle applies to your index.

Version the model into the index

Every collection is tied to a feature extractor — the component that runs the model and produces embeddings. The extractor has a name and a version, and together they form a Feature URI:

mixpeek://clip_vit_l_14@v1/image_embedding
mixpeek://siglip2-giant@v2/image_embedding

This URI is immutable. A collection built with clip@v1 will always serve clip@v1 embeddings. When you move to siglip2@v2, you don't modify the collection — you create a new one. The old collection stays live. Two embedding spaces coexist without touching each other.

The migration workflow

Clone the production collection with the new extractor:

# Clone production collection, swap the model
client.collections.clone(
    "col_product_images",
    collection_name="col_product_images_v2",
    feature_extractor={
        "feature_extractor_name": "siglip2-giant",
        "version": "v2",
        "input_mappings": {"image": "image_url"},
    }
)

The clone copies collection configuration. It doesn't copy vectors — the model changed, so old vectors aren't valid for the new extractor. Trigger reprocessing on the clone:

# Reembed with the new model — runs async
client.collections.trigger("col_product_images_v2")
# returns batch_id, task_id — production is untouched

While that runs, your production retriever still points at col_product_images. Nothing is broken. Users see no change.

Measure before you cut over

Most migrations skip this. They assume newer model means better results and cut over. Sometimes that's true. Sometimes the new model scores well on MTEB but performs worse on your specific data distribution. The only way to know is to measure.

Evaluations require a curated ground truth dataset — queries plus the documents that should rank in the top results. Run the same dataset against both retrievers:

# Same ground truth, two different retrievers
client.retrievers.evaluations.run("ret_product_v1", dataset_name="golden_queries")
client.retrievers.evaluations.run("ret_product_v2", dataset_name="golden_queries")
# Returns Precision@K, Recall@K, NDCG@K, MRR for each K

Benchmarks are more realistic if you have interaction history. They replay real sessions — actual queries users ran, documents they clicked on — and score both retrievers against observed behavior:

client.retrievers.benchmarks.create(
    benchmark_name="siglip_vs_clip",
    baseline_retriever_id="ret_product_v1",
    candidate_retriever_ids=["ret_product_v2"],
    session_count=500,
)
client.retrievers.benchmarks.execute("bench_abc123")
# Returns precision@10, MRR, NDCG, latency — baseline vs candidate with deltas

Metric	v1 (CLIP)	v2 (SigLIP2)	Delta
Precision@10	0.72	0.78	+8.3%
MRR	0.81	0.85	+4.9%
NDCG@10	0.76	0.82	+7.9%
Avg latency	145ms	160ms	+10.3%

500 replayed user sessions · 50K product images · benchmark vs baseline ret_product_v1

The new model is meaningfully better on retrieval quality and 15ms slower — a tradeoff you can evaluate deliberately rather than discover after shipping.

Cutover and rollback

Blue-green: create a new retriever pointing at col_product_images_v2, update your application to use the new retriever ID. The old retriever stays alive. If something goes wrong in production that your offline eval didn't catch, switch back to the old retriever ID. Rollback is a config change, not a re-indexing job.

# New retriever pointing at the v2 collection
client.retrievers.clone(
    "ret_product_v1",
    retriever_name="ret_product_v2",
    collection_ids=["col_product_images_v2"],
)

If you're less confident, point a single retriever at both feature URIs simultaneously using weighted fusion. Start old model at 90%, new at 10%. Shift as confidence builds.

"searches": [
    {
        "feature_uri": "mixpeek://clip_vit_l_14@v1/image_embedding",
        "query": "{{INPUT.query}}",
        "top_k": 100,
        "weight": 0.9,
    },
    {
        "feature_uri": "mixpeek://siglip2-giant@v2/image_embedding",
        "query": "{{INPUT.query}}",
        "top_k": 100,
        "weight": 0.1,
    },
],
"fusion": "weighted",

When you're satisfied, remove the old search leg. No re-indexing required — both collections were running the whole time, so the new one is already fully warm.

The reason model migrations feel expensive usually isn't the model. It's the index architecture: mutable, unversioned, with no staging layer and no way to compare before committing. Fix the embedding versioning design and the model becomes just a parameter. The teams that do this well don't run migrations. They run experiments.

Go deeper: the full migration playbook

We wrote up the complete decision framework -- full re-embedding behind a dual index, learned vector-space translation (orthogonal Procrustes and adapters), query-side bridging, and how to validate recall parity before you cut over -- in How to Switch Embedding Models Without Re-Embedding Everything. And if you would rather have the dual-index pattern operated for you, Mixpeek's vector store runs old and new embedding generations side by side over your object storage, so a model swap becomes a cutover decision instead of a rebuild.