Can I bring my own fine-tuned models to Mixpeek?

Yes. You can register custom feature extractors that point to your own model serving endpoints. Mixpeek's pipeline will call your endpoint during batch processing and index the returned embeddings alongside built-in features.

How does Mixpeek handle multi-vector search with different embedding models?

Qdrant supports named vectors, so each feature extractor's embeddings are stored under a distinct vector name within the same point. Retriever stages can target specific vector names, enabling hybrid search across models with different embedding dimensions.

Does Mixpeek support evaluation metrics for retrieval quality?

The retriever execution API returns scored results that you can use with standard IR metrics (nDCG, MRR, recall@k). You can build evaluation loops that compare retriever configurations against a labeled test set through the same API endpoints.

How are GPU resources allocated for model inference?

Mixpeek uses Ray Serve for model inference, which handles auto-scaling based on request load. Batch processing routes work through Ray Data pipelines, distributing GPU-bound tasks across available workers with configurable concurrency.

Mixpeek for ML Engineers

Ship multimodal models to production without building the serving stack from scratch

ML engineers need to evaluate, deploy, and monitor embedding and classification models across text, image, video, and audio. Mixpeek provides the inference infrastructure, feature extraction pipeline, and retrieval layer so you can focus on model quality rather than MLOps plumbing.

Get Started as a ML Engineer Read the Docs

What's Broken Today

1Model serving complexity

Deploying CLIP, E5, SigLIP, and custom models behind a consistent API with auto-scaling, batching, and health checks requires significant infrastructure work.

2Evaluation across modalities

Measuring retrieval quality when queries span text, images, and video requires custom evaluation harnesses that most ML teams build ad-hoc.

3Embedding drift detection

Production embedding distributions shift over time as data changes, but most pipelines lack automated drift detection and alerting for vector quality.

4A/B testing retrieval strategies

Comparing hybrid search versus pure vector search, or evaluating new reranking models, requires duplicating infrastructure rather than flipping a configuration.

5Feature store fragmentation

Embeddings, extracted text, classification labels, and other features end up scattered across different storage systems with no unified access layer.

How Mixpeek Helps

Pre-built model serving via Ray Serve

CLIP, E5, SigLIP, and vLLM endpoints are deployed as Ray Serve deployments with auto-scaling, health checks, and unified API access out of the box.

Configurable retriever stages

Chain feature search, attribute filters, reranking, and aggregation stages declaratively. Test different retrieval strategies by modifying configuration, not code.

Semantic drift monitoring

Track embedding distribution changes over time. Detect when production data diverges from training data and trigger model refresh workflows automatically.

Unified feature layer

All extracted features, from embeddings to transcripts to taxonomy labels, are stored as Qdrant payload fields alongside vectors, providing a single source of truth.

How It Works for ML Engineers

Select or register feature extractors

Choose from built-in extractors (CLIP, E5, SigLIP) or register custom model endpoints. Each extractor defines the embedding dimensions and modality it handles.

Configure collection processing

Assign extractors to a collection, specifying which modalities to process and what features to extract. The collection defines your model's production feature pipeline.

Build and test retriever pipelines

Define multi-stage retrievers that chain search, filter, and rerank operations. Compare retrieval quality across different configurations using the same test queries.

Monitor production model performance

Track retrieval latency, embedding drift, and feature extraction success rates. Set up alerts when model quality degrades below acceptable thresholds.

Iterate and deploy model updates

Swap extractors or update models by modifying collection configuration. Re-trigger batch processing to backfill new embeddings while keeping old ones accessible.

Relevant Features

Ray Serve inference
Retriever pipelines
Drift detection
Taxonomy classification
Feature extractors

Integrations

Ray
vLLM
Hugging Face
Qdrant
Weights & Biases

"Mixpeek cut our time-to-production for new embedding models from three weeks to two days. We configure a new extractor, run a backfill, and compare retrieval metrics side by side without touching infrastructure."

Priya Narayanan

ML Engineer, Vectrix Labs

Frequently Asked Questions

Related Resources

Industry Solutions

Advertising

Transform ad targeting and brand safety with multimodal data

Entertainment

Organize and monetize content across all formats

Dataset Engineering

Streamline dataset creation, curation, and management for AI.

Implementation Recipes

Semantic Multimodal Search

Unified semantic search across all content types. Query by natural language and retrieve relevant video clips, images, audio segments, and documents based on meaning-not keywords or manual tags.

Feature Extraction

Multi-tier feature extraction that decomposes content into searchable components: embeddings, transcripts, detected objects, OCR text, scene boundaries, and more. The foundation for all downstream retrieval and analysis.

Anomaly Detection

Identify outliers and anomalous content using embedding distance from cluster centroids. Flag quality issues, novel content, or items that don't match expected patterns.

Get Started as a ML Engineer

See how Mixpeek can help ml engineers build multimodal AI capabilities without the infrastructure overhead.

Schedule a Demo Read the Docs