Question 1

Can I use my own fine-tuned embedding models with Mixpeek?

Accepted Answer

Yes. Mixpeek's extractor system lets you register custom feature extractors that call any model endpoint. You define the input/output schema, and Mixpeek handles orchestration, batching, retries, and monitoring. This works with HuggingFace models, custom PyTorch endpoints, or any HTTP-based inference service.

Question 2

How does Mixpeek handle GPU provisioning for model inference?

Accepted Answer

Mixpeek uses Ray for distributed model serving across GPU and CPU workers. The infrastructure handles auto-scaling, request batching, and load balancing. You configure resource requirements per extractor, and the system allocates compute accordingly. For self-hosted deployments, you control the GPU cluster configuration directly.

Question 3

Can I compare retrieval quality across different embedding models?

Accepted Answer

Yes. Create separate collections with different extractor configurations, process the same dataset through each, and build retriever pipelines against each namespace. Run evaluation queries and compare precision, recall, and latency metrics side by side. This is the recommended workflow for model selection and hyperparameter tuning.

Question 4

What embedding dimensions and vector types does Mixpeek support?

Accepted Answer

Mixpeek supports arbitrary embedding dimensions and stores vectors in Qdrant, which handles dense, sparse, and multi-vector representations. There are no hard limits on dimensions -- the system adapts to whatever your models produce. Named vectors allow storing multiple embedding types per document.

Question 5

How do I handle model versioning when upgrading embeddings?

Accepted Answer

Create a new collection configuration with the updated model, trigger batch reprocessing of existing data, and index the new embeddings into a separate namespace. Once validation confirms the new embeddings perform better, switch production retrievers to the new namespace. The old namespace remains available for rollback.

Question 6

Does Mixpeek support multi-stage retrieval with reranking?

Accepted Answer

Yes. Retriever pipelines support composable stages: initial candidate retrieval via vector search, filtering by metadata, and reranking with cross-encoder or other scoring models. Each stage is independently configurable, and you can chain them in any order to build complex retrieval strategies.

Question 7

What monitoring and observability does Mixpeek provide for ML pipelines?

Accepted Answer

The API exposes batch processing status, per-object extraction results, throughput metrics, and error details. You can track embedding generation latency, extraction success rates, and indexing progress. For self-hosted deployments, the admin dashboard provides real-time engine performance metrics including VRAM usage and worker utilization.

Question 8

Can I use Mixpeek for model evaluation without deploying to production?

Accepted Answer

Yes. Create isolated namespaces and collections for evaluation purposes. Process evaluation datasets, run retrieval benchmarks, and compare results without affecting production data or configurations. This is the recommended workflow for model experimentation and A/B testing before promotion.

Mixpeek for AI/ML Engineers

What's Broken Today

1Prototype-to-production gap

2Multi-model orchestration complexity

3Embedding infrastructure overhead

4Evaluation and iteration friction

How Mixpeek Helps

Managed model serving at scale

Custom extractor system for your models

Composable retrieval for evaluation

End-to-end pipeline observability

How It Works for AI/ML Engineers

Register custom models as feature extractors

Configure extraction pipelines per experiment

Ingest evaluation datasets

Build retrieval pipelines for evaluation

Iterate on model and pipeline configuration

Promote winning configuration to production

Relevant Features

Integrations

Frequently Asked Questions

Get Started as a AI/ML Engineer