Single Tenant

Mixpeek’s single-tenant deployment gives enterprise customers a fully isolated data plane: dedicated database, compute cluster, cache, object storage, and job queues. The shared control plane (API gateway, auth, billing) routes requests to your data plane transparently — your API keys and SDKs work the same way.

Architecture

What’s isolated

Resource	Isolation Level	Details
Database	Dedicated database	Separate MongoDB instance or database per tenant
Compute	Dedicated Ray cluster	Head node + autoscaling worker pools (CPU, batch, GPU)
Cache	Dedicated Redis	Separate instance with independent memory and connection pools
Storage	Dedicated bucket	Separate GCS/S3 bucket per tenant
Job queues	Dedicated queues	Celery queues prefixed per tenant — jobs never compete with other customers
Vector store	Shared cluster, isolated data	Namespace-level isolation within the vector index

What’s shared

The control plane is stateless — it routes requests but holds no customer data:

API gateway — resolves your API key to your data plane endpoints
Studio UI — connects to the API, holds no data
Auth and API key management
Billing and usage metering
Container image registry — same code, separate compute

Tenant Routing

Every API request goes through tenant resolution:

Your API key authenticates against the shared auth layer
The API resolves your organization to a tenant configuration
The tenant config specifies your data plane endpoints (database, cache, compute, storage)
The request executes entirely within your isolated infrastructure

Tenant routing is transparent. Your API keys, SDKs, and integrations work identically to the shared platform — no code changes required.

Compute Cluster

Your Ray cluster runs in a dedicated Kubernetes namespace with independent scaling.

Worker groups

Group	Default Range	Use Case
CPU workers	1–2 nodes	Text embeddings, reranking, classification, image embeddings
Batch workers	0–3 nodes	Large ingestion jobs (scale from zero on demand)
GPU workers	0–2 nodes (NVIDIA L4)	Video processing, large model inference

Each group autoscales independently. Batch and GPU workers default to zero replicas and scale up when jobs arrive — you only pay for compute when it’s active.

Extractor scaling

Individual extractors (embedding models, classifiers, etc.) scale independently within your cluster:

min_replicas — minimum always-running instances (0 = scale to zero when idle)
max_replicas — maximum instances under load
target_ongoing_requests — requests per replica before scaling up
downscale_delay_s — cooldown before scaling down (prevents flapping)

Set min_replicas: 1 for latency-sensitive extractors (e.g., your primary embedding model for search). Use min_replicas: 0 for batch-only extractors to save cost.

Disabling extractors

If you don’t use certain capabilities (e.g., audio embeddings, face recognition, web scraping), disable the corresponding extractors. This frees compute resources for the extractors you do use and reduces your always-on footprint.

Self-Service Configuration

Enterprise tenants manage their cluster configuration via a YAML overrides file. On each platform deploy, Mixpeek merges your overrides with the latest extractor registry — new extractors appear automatically, disabled extractors stay disabled.

What you can configure

Section	Controls
`auto_deploy`	When `true`, platform updates on main auto-deploy to your cluster
`disabled`	List of extractors to exclude from your cluster
`overrides`	Per-extractor scaling (min/max replicas, resources, concurrency)
`cluster`	Worker group sizing (replicas, min/max nodes)
`head`	Head node resources (CPU, memory)
`env`	Environment variable overrides
`spec`	Health check thresholds

Example overrides

# Auto-deploy platform updates (set false if running a fork)
auto_deploy: true

# Disable extractors you don't need
disabled:
  - mixpeek__playwright          # no web scraping
  - laion__clap_htsat_tiny       # no audio embeddings
  - insightface__arcface         # no face recognition

# Scale extractors for your workload
overrides:
  intfloat__multilingual_e5_large_instruct:
    autoscaling_config:
      min_replicas: 2            # always warm for search
      max_replicas: 6            # burst for batch ingestion

# Size your worker groups
cluster:
  cpu-workers:
    minReplicas: 1
    maxReplicas: 4
  batch-workers:
    minReplicas: 0
    maxReplicas: 5               # more batch capacity
  gpu-workers:
    minReplicas: 0
    maxReplicas: 3

Changes take effect on the next deploy or manual apply. Your Mixpeek point of contact can walk you through the initial configuration.

Kubernetes Access

Enterprise tenants get operator-level access to their namespace:

Action	Access
View pods, logs, events	Yes
Scale worker groups	Yes
Restart stuck pods	Yes
Port-forward to Ray dashboard	Yes
View Ray cluster status	Yes
Access secrets or RBAC	No
Modify other namespaces	No

Quick scaling

For immediate scaling (e.g., before a large batch job):

# Scale batch workers to 3
kubectl -n <your-namespace> scale raycluster <your-cluster> \
  --replicas=3 --resource-name=batch-workers

# Scale GPU workers to 1 for video processing
kubectl -n <your-namespace> scale raycluster <your-cluster> \
  --replicas=1 --resource-name=gpu-workers

# Scale back down after the job
kubectl -n <your-namespace> scale raycluster <your-cluster> \
  --replicas=0 --resource-name=batch-workers

Manual scaling is temporary. The next deploy resets to the values in your overrides file.

Monitoring

Ray Dashboard

Port-forward to access your Ray dashboard locally:

kubectl -n <your-namespace> port-forward svc/<your-head-svc> 8265:8265
# Open http://localhost:8265

The dashboard shows active deployments, replica counts, request queues, worker resource usage, and cluster utilization.

Grafana

Each tenant gets scoped Grafana dashboards:

Queue Health — depth and age of your job queues
Batch Status — ingestion progress, success/failure rates
API Latency — p50/p95/p99 for your requests
Cost & Usage — compute hours, resource utilization
Error Rate — 5xx errors scoped to your tenant

kubectl

# Pod resource usage
kubectl -n <your-namespace> top pods

# Recent events (scheduling failures, OOM, probe failures)
kubectl -n <your-namespace> get events --sort-by=.lastTimestamp | tail -20

# Ray Serve status (which extractors are running)
kubectl -n <your-namespace> exec -it deploy/<your-head> -- serve status

Billing

Single-tenant billing has two components:

Platform fee — fixed monthly fee for access to the Mixpeek platform, API, Studio, and support
Compute passthrough — actual cloud infrastructure cost (GKE nodes, storage, networking) passed through at cost plus a management markup

There are no per-operation credit charges on the single-tenant plan. You pay for the underlying cloud resources your cluster consumes, and Mixpeek handles provisioning, monitoring, upgrades, and support.

Keep batch and GPU workers at minReplicas: 0 — they scale from zero on demand. You only pay for compute when it’s active. Disable unused extractors to reduce your always-on footprint.

Troubleshooting

Pods stuck in Pending

Check events for the pending pod:

kubectl -n <your-namespace> describe pod <pod-name>

Common causes:

Insufficient resources — cluster autoscaler is provisioning a new node (2-3 minutes)
GPU unavailable — L4 GPUs may be temporarily exhausted in the region
Resource limits — reduce maxReplicas on other worker groups to free capacity

OOMKilled pods

A pod exceeded its memory limit. Increase memory for the affected worker group in your overrides file:

cluster:
  cpu-workers:
    resources:
      limits:
        memory: "48Gi"

Extractor returning 503

The extractor has no running replicas (scaled to zero) or all replicas are saturated:

First request after idle takes 5-10 seconds for cold start
Set min_replicas: 1 for latency-sensitive extractors
Increase max_replicas if you’re seeing sustained 503s under load

Batch jobs stuck

Check if batch workers are running: kubectl -n <your-namespace> get pods -l ray.io/group=batch-workers
If no batch workers, manually scale up: kubectl -n <your-namespace> scale raycluster <your-cluster> --replicas=1 --resource-name=batch-workers
Check queue depth in Grafana — a backlog is normal for large batches

Custom Code (Fork Deploys)

Single-tenant customers can fork the Mixpeek codebase and deploy custom code to their tenant:

Custom extractors — add domain-specific feature extraction logic
Modified inference — tune model parameters, swap models, add pre/post-processing
Engine changes — adjust batch processing, add custom endpoints

Your fork builds into a tenant-specific container image (tenant-<name>-<sha>) and deploys only to your namespace. The shared platform is unaffected.

Workflow

Fork the Mixpeek repo
Make your changes (extractors, inference, engine code)
Trigger a tenant deploy via GitHub Actions — builds from your fork, deploys to your namespace
Rebase on upstream periodically to pick up platform updates

Config-only changes (disabling extractors, adjusting scaling) don’t require a fork or image build — edit your overrides file and trigger a config deploy.

Getting Started

To provision a single-tenant data plane:

Contact your Mixpeek account manager or email sales@mixpeek.com
We provision your isolated infrastructure (database, compute, cache, storage)
You receive kubectl access to your namespace and Grafana dashboards
Your existing API keys are routed to your dedicated data plane — no code changes

Migration from the shared platform to single-tenant is seamless. Your data is copied to the isolated database, the tenant config is updated, and routing switches instantly. Rollback is equally fast.

Get Started

What Mixpeek Extracts

Retrieval

Platform

Vector Store

Resources

Architecture

What’s isolated

What’s shared

Tenant Routing

Compute Cluster

Worker groups

Extractor scaling

Disabling extractors

Self-Service Configuration

What you can configure

Example overrides

Kubernetes Access

Quick scaling

Monitoring

Ray Dashboard

Grafana

kubectl

Billing

Troubleshooting

Pods stuck in Pending

OOMKilled pods

Extractor returning 503

Batch jobs stuck

Custom Code (Fork Deploys)

Workflow

Getting Started

Get Started

What Mixpeek Extracts

Retrieval

Platform

Vector Store

Resources

Documentation Index

​Architecture

​What’s isolated

​What’s shared

​Tenant Routing

​Compute Cluster

​Worker groups

​Extractor scaling

​Disabling extractors

​Self-Service Configuration

​What you can configure

​Example overrides

​Kubernetes Access

​Quick scaling

​Monitoring

​Ray Dashboard

​Grafana

​kubectl

​Billing

​Troubleshooting

​Pods stuck in Pending

​OOMKilled pods

​Extractor returning 503

​Batch jobs stuck

​Custom Code (Fork Deploys)

​Workflow

​Getting Started

Architecture

What’s isolated

What’s shared

Tenant Routing

Compute Cluster

Worker groups

Extractor scaling

Disabling extractors

Self-Service Configuration

What you can configure

Example overrides

Kubernetes Access

Quick scaling

Monitoring

Ray Dashboard

Grafana

kubectl

Billing

Troubleshooting

Pods stuck in Pending

OOMKilled pods

Extractor returning 503

Batch jobs stuck

Custom Code (Fork Deploys)

Workflow

Getting Started