Environment Branching
Clone namespaces, collections, retrievers, and taxonomies into isolated environments to experiment safely without re-processing data
Why do anything?
AI pipelines are stateful, changing an embedding model, retriever pipeline, or taxonomy schema has downstream effects that are hard to undo. Teams either experiment directly in production (risky) or maintain separate environments that drift out of sync (expensive and unreliable).
Why now?
As AI pipelines move into regulated verticals like healthcare and adtech, the cost of a bad production change escalates, misclassified ad content, incorrect clinical labels, or degraded search quality can have immediate business impact. You need a safe, repeatable path for testing changes.
Why this feature?
Mixpeek's clone-based branching operates at every layer of the pipeline, namespace, collection, retriever, taxonomy, so you can create a fully isolated staging environment in minutes, experiment with real production data, and promote changes only after quality gates pass.
How It Works
Every Mixpeek resource is immutable by design: you cannot change a collection's feature extractor or a retriever's pipeline stages via PATCH. This preserves execution history and audit trails. The clone primitive is the intentional escape hatch, it creates a new resource with new IDs, sharing underlying data where possible.
Namespace Clone
Deep-copies the full environment: collections (with Qdrant vectors), retrievers, and bucket metadata. All internal IDs are remapped. Runs as an async Celery task; returns a task_id for polling and a DeepCloneResult with the complete ID mapping.
Collection Clone
Clones a collection and optionally overrides the feature extractor. When the extractor is unchanged, vectors are reused. When the extractor changes (embedding model swap), reprocessing must be triggered explicitly, vectors are model-specific.
Retriever Clone
Clones a retriever with optional stage overrides. The clone shares the same underlying collection (no extra vector storage). Use this to test a new ranking strategy, additional rerank stage, or adjusted fusion weights without touching the live pipeline.
Taxonomy Clone
Clones a taxonomy and optionally swaps the backing retriever or hierarchy config. Both versions run in parallel, compare classification quality before committing to a schema migration.
Why This Approach
The immutable + clone model gives you git-style isolation without git's complexity. Every change is traceable, every experiment is reversible, and rollback is as simple as repointing a retriever ID. The shared-vector architecture means clones are cheap: you only pay for storage when vectors actually diverge (i.e., when the embedding model changes).
Integration
client.namespaces.clone(namespace_identifier, body)client.collections.clone(collection_identifier, body)client.retrievers.clone(retriever_identifier, body)client.taxonomies.clone(taxonomy_identifier, body)
