Mixpeek’s single-tenant deployment gives enterprise customers a fully isolated data plane: dedicated database, compute cluster, cache, object storage, and job queues. The shared control plane (API gateway, auth, billing) routes requests to your data plane transparently — your API keys and SDKs work the same way.Documentation Index
Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Architecture
What’s isolated
| Resource | Isolation Level | Details |
|---|---|---|
| Database | Dedicated database | Separate MongoDB instance or database per tenant |
| Compute | Dedicated Ray cluster | Head node + autoscaling worker pools (CPU, batch, GPU) |
| Cache | Dedicated Redis | Separate instance with independent memory and connection pools |
| Storage | Dedicated bucket | Separate GCS/S3 bucket per tenant |
| Job queues | Dedicated queues | Celery queues prefixed per tenant — jobs never compete with other customers |
| Vector store | Shared cluster, isolated data | Namespace-level isolation within the vector index |
What’s shared
The control plane is stateless — it routes requests but holds no customer data:- API gateway — resolves your API key to your data plane endpoints
- Studio UI — connects to the API, holds no data
- Auth and API key management
- Billing and usage metering
- Container image registry — same code, separate compute
Tenant Routing
Every API request goes through tenant resolution:- Your API key authenticates against the shared auth layer
- The API resolves your organization to a tenant configuration
- The tenant config specifies your data plane endpoints (database, cache, compute, storage)
- The request executes entirely within your isolated infrastructure
Tenant routing is transparent. Your API keys, SDKs, and integrations work identically to the shared platform — no code changes required.
Compute Cluster
Your Ray cluster runs in a dedicated Kubernetes namespace with independent scaling.Worker groups
| Group | Default Range | Use Case |
|---|---|---|
| CPU workers | 1–2 nodes | Text embeddings, reranking, classification, image embeddings |
| Batch workers | 0–3 nodes | Large ingestion jobs (scale from zero on demand) |
| GPU workers | 0–2 nodes (NVIDIA L4) | Video processing, large model inference |
Extractor scaling
Individual extractors (embedding models, classifiers, etc.) scale independently within your cluster:min_replicas— minimum always-running instances (0 = scale to zero when idle)max_replicas— maximum instances under loadtarget_ongoing_requests— requests per replica before scaling updownscale_delay_s— cooldown before scaling down (prevents flapping)
Disabling extractors
If you don’t use certain capabilities (e.g., audio embeddings, face recognition, web scraping), disable the corresponding extractors. This frees compute resources for the extractors you do use and reduces your always-on footprint.Self-Service Configuration
Enterprise tenants manage their cluster configuration via a YAML overrides file. On each platform deploy, Mixpeek merges your overrides with the latest extractor registry — new extractors appear automatically, disabled extractors stay disabled.What you can configure
| Section | Controls |
|---|---|
auto_deploy | When true, platform updates on main auto-deploy to your cluster |
disabled | List of extractors to exclude from your cluster |
overrides | Per-extractor scaling (min/max replicas, resources, concurrency) |
cluster | Worker group sizing (replicas, min/max nodes) |
head | Head node resources (CPU, memory) |
env | Environment variable overrides |
spec | Health check thresholds |
Example overrides
Kubernetes Access
Enterprise tenants get operator-level access to their namespace:| Action | Access |
|---|---|
| View pods, logs, events | Yes |
| Scale worker groups | Yes |
| Restart stuck pods | Yes |
| Port-forward to Ray dashboard | Yes |
| View Ray cluster status | Yes |
| Access secrets or RBAC | No |
| Modify other namespaces | No |
Quick scaling
For immediate scaling (e.g., before a large batch job):Manual scaling is temporary. The next deploy resets to the values in your overrides file.
Monitoring
Ray Dashboard
Port-forward to access your Ray dashboard locally:Grafana
Each tenant gets scoped Grafana dashboards:- Queue Health — depth and age of your job queues
- Batch Status — ingestion progress, success/failure rates
- API Latency — p50/p95/p99 for your requests
- Cost & Usage — compute hours, resource utilization
- Error Rate — 5xx errors scoped to your tenant
kubectl
Billing
Single-tenant billing has two components:- Platform fee — fixed monthly fee for access to the Mixpeek platform, API, Studio, and support
- Compute passthrough — actual cloud infrastructure cost (GKE nodes, storage, networking) passed through at cost plus a management markup
Troubleshooting
Pods stuck in Pending
Check events for the pending pod:- Insufficient resources — cluster autoscaler is provisioning a new node (2-3 minutes)
- GPU unavailable — L4 GPUs may be temporarily exhausted in the region
- Resource limits — reduce
maxReplicason other worker groups to free capacity
OOMKilled pods
A pod exceeded its memory limit. Increase memory for the affected worker group in your overrides file:Extractor returning 503
The extractor has no running replicas (scaled to zero) or all replicas are saturated:- First request after idle takes 5-10 seconds for cold start
- Set
min_replicas: 1for latency-sensitive extractors - Increase
max_replicasif you’re seeing sustained 503s under load
Batch jobs stuck
- Check if batch workers are running:
kubectl -n <your-namespace> get pods -l ray.io/group=batch-workers - If no batch workers, manually scale up:
kubectl -n <your-namespace> scale raycluster <your-cluster> --replicas=1 --resource-name=batch-workers - Check queue depth in Grafana — a backlog is normal for large batches
Custom Code (Fork Deploys)
Single-tenant customers can fork the Mixpeek codebase and deploy custom code to their tenant:- Custom extractors — add domain-specific feature extraction logic
- Modified inference — tune model parameters, swap models, add pre/post-processing
- Engine changes — adjust batch processing, add custom endpoints
tenant-<name>-<sha>) and deploys only to your namespace. The shared platform is unaffected.
Workflow
- Fork the Mixpeek repo
- Make your changes (extractors, inference, engine code)
- Trigger a tenant deploy via GitHub Actions — builds from your fork, deploys to your namespace
- Rebase on upstream periodically to pick up platform updates
Config-only changes (disabling extractors, adjusting scaling) don’t require a fork or image build — edit your overrides file and trigger a config deploy.
Getting Started
To provision a single-tenant data plane:- Contact your Mixpeek account manager or email sales@mixpeek.com
- We provision your isolated infrastructure (database, compute, cache, storage)
- You receive kubectl access to your namespace and Grafana dashboards
- Your existing API keys are routed to your dedicated data plane — no code changes
Migration from the shared platform to single-tenant is seamless. Your data is copied to the isolated database, the tenant config is updated, and routing switches instantly. Rollback is equally fast.

