The Short Answer
Switching embedding models normally means re-embedding your entire corpus, because two models place the same content at unrelated coordinates -- their vector spaces are mutually unintelligible. There are three real migration strategies: (1) full re-embedding behind a dual index, the safe default; (2) learned vector-space translation, which fits a mapping (an orthogonal rotation or a small adapter network) from the old space to the new one so existing vectors can be reused at some quality cost; and (3) query-side bridging, where only queries are translated and the corpus stays in the old space. Which one is right depends on corpus size, quality tolerance, and whether the original content is still available to re-embed.
This guide explains why the naive swap fails, how each strategy works with real algorithms, and how to validate a migration before cutting over.
Why Can't I Just Swap Embedding Models?
An embedding model does not assign meaning to absolute coordinates. Two models -- even two versions of the same model -- learn different bases, different dimensionalities, and different geometric conventions, so a vector from model A is noise to an index built with model B. Nearest-neighbor search only works when query and corpus vectors come from the same space.
Three specific incompatibilities bite in practice:
The result: a model swap is an index migration, not a config change. The question is only how you pay for it.
Option 1: Full Re-Embedding Behind a Dual Index
The safe default. Keep the old index serving traffic, build a new index with the new model, and cut over when the new one passes evaluation.
1. Provision a second index (or namespace) for the new model's vectors. 2. Re-embed the corpus in batches, oldest-content-last so the freshest data is available in the new space first. 3. Mirror new writes to both indexes during the migration window. 4. Shadow-test: run production queries against both, compare results offline. 5. Cut reads over, keep the old index warm for rollback, then retire it.
The cost is one full embedding pass over the corpus plus temporarily doubled storage. For a corpus of N items that is unavoidable compute, which is why this strategy hurts at scale: a billion-vector corpus with a GPU encoder is a real bill. The operational half of this playbook -- versioning, dual-indexing, progressive rollout -- is covered in depth in Embedding Portability and Versioning.
When the original content is gone (expired licenses, deleted sources, compliance purges), full re-embedding is impossible -- you physically cannot regenerate vectors without the inputs. That is when translation stops being an optimization and becomes the only option.
Option 2: Learned Vector-Space Translation
If you have (or can produce) paired vectors -- the same items embedded by both models -- you can learn a function that maps old-space vectors into the new space. Existing vectors are then translated in place, at a fraction of re-embedding cost.
The classic solution is orthogonal Procrustes: find the rotation matrix that best aligns the paired anchor sets. It has a closed-form solution via SVD:
import numpy as np
def fit_procrustes(A_old, B_new):
# A_old: (n, d) anchors in the old space, B_new: same items in the new space.
# Returns W, an orthogonal map minimizing ||A_old @ W - B_new||_F.
M = A_old.T @ B_new
U, _, Vt = np.linalg.svd(M)
return U @ Vt
W = fit_procrustes(anchors_old, anchors_new) # a few thousand pairs suffice
migrated = old_vectors @ W # translate the whole corpus
migrated /= np.linalg.norm(migrated, axis=1, keepdims=True)Two caveats govern everything here. First, translation quality is bounded: expect a measurable recall drop versus native new-model vectors, concentrated on out-of-distribution content -- treat translated vectors as a bridge, not a destination. Second, anchors must cover your actual data distribution; a map fitted on generic text will mis-translate domain-specific regions of the space. The same geometry that creates the modality gap in cross-modal spaces limits how cleanly any linear map can align two models.
Option 3: Query-Side Bridging
Sometimes the right move is to translate nothing in the corpus at all. In query-side bridging the corpus stays in the old space, and each incoming query is embedded with the OLD model (or translated old-ward) while a parallel new-space index is built up lazily -- new content goes to the new index, and queries fan out to both, with results merged by rank (score fusion across incompatible spaces requires normalization or rank-based fusion).
This is the lowest-risk, lowest-cost start of a migration: zero corpus writes, instant reversibility. Its weakness is permanence -- you are now operating two encoders and two indexes indefinitely, and cross-space result merging caps retrieval quality. Use it as a transition state that ends in either Option 1 or Option 2, not as an end state.
Which Migration Strategy Should I Use?
| Full re-embed + dual index | Vector translation | Query-side bridging |
| Compute cost | One full encoder pass over corpus | Fit on ~1-10K anchor pairs, one matrix multiply per vector | Near zero upfront |
| Quality vs native | Identical (it IS native) | Small but real recall drop; worst off-distribution | Old-model quality; merging caps gains |
| Needs original content | Yes, all of it | Only for the anchor set | No |
| Storage during migration | 2x | 1x (translate in place) or 2x (keep both) | Grows toward 2x |
| Reversibility | Excellent (old index intact) | Depends on keeping originals | Trivial |
| Best when | Content available, budget exists | Corpus huge or content partly gone | You need to start today, decide later |
How Do I Validate an Embedding Migration?
Never cut over on faith. Build a fixed evaluation set of real production queries with known-relevant results, then compare the candidate index against the incumbent on rank-based metrics:
def recall_at_k(index, queries, relevant, k=10):
hits = 0
for q, rel in zip(queries, relevant):
got = {r.id for r in index.search(q, limit=k)}
hits += bool(got & rel)
return hits / len(queries)
# Gate the cutover: new index must be within tolerance of the old one
assert recall_at_k(new_index, eval_q, eval_rel) >= 0.98 * recall_at_k(old_index, eval_q, eval_rel)Embedding Migration in Practice
Mixpeek, a multimodal indexing and retrieval platform over object storage, treats embedding migration as an infrastructure concern rather than a user problem: namespaces version their feature spaces, so a new embedding model lands as a new index built from the same source objects while the old one keeps serving -- the dual-index pattern above, operated for you. With MVS (Mixpeek Vector Store) you can also bring pre-computed vectors from an existing system and run both generations side by side during a cutover. To compare encoder options before migrating, see the best embedding models and best multimodal embedding models lists; for what a switch costs at the index layer, see embedding quantization, which often ships alongside a model upgrade to claw back the storage bill.