Environment Branching

Clone namespaces, collections, retrievers, and taxonomies into isolated environments to experiment safely without re-processing data

Read the branching guide Beginner Tutorial

Why do anything?

AI pipelines are stateful, changing an embedding model, retriever pipeline, or taxonomy schema has downstream effects that are hard to undo. Teams either experiment directly in production (risky) or maintain separate environments that drift out of sync (expensive and unreliable).

Why now?

As AI pipelines move into regulated verticals like healthcare and adtech, the cost of a bad production change escalates, misclassified ad content, incorrect clinical labels, or degraded search quality can have immediate business impact. You need a safe, repeatable path for testing changes.

Why this feature?

Mixpeek's clone-based branching operates at every layer of the pipeline, namespace, collection, retriever, taxonomy, so you can create a fully isolated staging environment in minutes, experiment with real production data, and promote changes only after quality gates pass.

How It Works

Every Mixpeek resource is immutable by design: you cannot change a collection's feature extractor or a retriever's pipeline stages via PATCH. This preserves execution history and audit trails. The clone primitive is the intentional escape hatch, it creates a new resource with new IDs, sharing underlying data where possible.

Namespace Clone

Deep-copies the full environment: collections (with Qdrant vectors), retrievers, and bucket metadata. All internal IDs are remapped. Runs as an async Celery task; returns a task_id for polling and a DeepCloneResult with the complete ID mapping.

Collection Clone

Clones a collection and optionally overrides the feature extractor. When the extractor is unchanged, vectors are reused. When the extractor changes (embedding model swap), reprocessing must be triggered explicitly, vectors are model-specific.

Retriever Clone

Clones a retriever with optional stage overrides. The clone shares the same underlying collection (no extra vector storage). Use this to test a new ranking strategy, additional rerank stage, or adjusted fusion weights without touching the live pipeline.

Taxonomy Clone

Clones a taxonomy and optionally swaps the backing retriever or hierarchy config. Both versions run in parallel, compare classification quality before committing to a schema migration.

Why This Approach

The immutable + clone model gives you git-style isolation without git's complexity. Every change is traceable, every experiment is reversible, and rollback is as simple as repointing a retriever ID. The shared-vector architecture means clones are cheap: you only pay for storage when vectors actually diverge (i.e., when the embedding model changes).

Where This Is Used

adtech

Embedding Model Swap for Property Listings

Integration

client.namespaces.clone(namespace_identifier, body)

client.collections.clone(collection_identifier, body)

client.retrievers.clone(retriever_identifier, body)

client.taxonomies.clone(taxonomy_identifier, body)

View Documentation

Related Capabilities

prerequisite

Namespaces

Namespaces are the top-level isolation boundary; namespace clone is the primary branching primitive

Collections

Collections are the most common clone target for embedding model experiments

often combined

Retriever Evaluation

Run evaluations on the cloned retriever before promoting to production