Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Mixpeek’s single-tenant deployment gives enterprise customers a fully isolated data plane: dedicated database, compute cluster, cache, object storage, and job queues. The shared control plane (API gateway, auth, billing) routes requests to your data plane transparently — your API keys and SDKs work the same way.

Architecture

Single-tenant architecture: shared control plane routing to isolated enterprise data plane and shared standard data plane

What’s isolated

ResourceIsolation LevelDetails
DatabaseDedicated databaseSeparate MongoDB instance or database per tenant
ComputeDedicated Ray clusterHead node + autoscaling worker pools (CPU, batch, GPU)
CacheDedicated RedisSeparate instance with independent memory and connection pools
StorageDedicated bucketSeparate GCS/S3 bucket per tenant
Job queuesDedicated queuesCelery queues prefixed per tenant — jobs never compete with other customers
Vector storeShared cluster, isolated dataNamespace-level isolation within the vector index

What’s shared

The control plane is stateless — it routes requests but holds no customer data:
  • API gateway — resolves your API key to your data plane endpoints
  • Studio UI — connects to the API, holds no data
  • Auth and API key management
  • Billing and usage metering
  • Container image registry — same code, separate compute

Tenant Routing

Every API request goes through tenant resolution:
  1. Your API key authenticates against the shared auth layer
  2. The API resolves your organization to a tenant configuration
  3. The tenant config specifies your data plane endpoints (database, cache, compute, storage)
  4. The request executes entirely within your isolated infrastructure
Tenant routing is transparent. Your API keys, SDKs, and integrations work identically to the shared platform — no code changes required.

Compute Cluster

Your Ray cluster runs in a dedicated Kubernetes namespace with independent scaling.

Worker groups

GroupDefault RangeUse Case
CPU workers1–2 nodesText embeddings, reranking, classification, image embeddings
Batch workers0–3 nodesLarge ingestion jobs (scale from zero on demand)
GPU workers0–2 nodes (NVIDIA L4)Video processing, large model inference
Each group autoscales independently. Batch and GPU workers default to zero replicas and scale up when jobs arrive — you only pay for compute when it’s active.

Extractor scaling

Individual extractors (embedding models, classifiers, etc.) scale independently within your cluster:
  • min_replicas — minimum always-running instances (0 = scale to zero when idle)
  • max_replicas — maximum instances under load
  • target_ongoing_requests — requests per replica before scaling up
  • downscale_delay_s — cooldown before scaling down (prevents flapping)
Set min_replicas: 1 for latency-sensitive extractors (e.g., your primary embedding model for search). Use min_replicas: 0 for batch-only extractors to save cost.

Disabling extractors

If you don’t use certain capabilities (e.g., audio embeddings, face recognition, web scraping), disable the corresponding extractors. This frees compute resources for the extractors you do use and reduces your always-on footprint.

Self-Service Configuration

Enterprise tenants manage their cluster configuration via a YAML overrides file. On each platform deploy, Mixpeek merges your overrides with the latest extractor registry — new extractors appear automatically, disabled extractors stay disabled.

What you can configure

SectionControls
auto_deployWhen true, platform updates on main auto-deploy to your cluster
disabledList of extractors to exclude from your cluster
overridesPer-extractor scaling (min/max replicas, resources, concurrency)
clusterWorker group sizing (replicas, min/max nodes)
headHead node resources (CPU, memory)
envEnvironment variable overrides
specHealth check thresholds

Example overrides

# Auto-deploy platform updates (set false if running a fork)
auto_deploy: true

# Disable extractors you don't need
disabled:
  - mixpeek__playwright          # no web scraping
  - laion__clap_htsat_tiny       # no audio embeddings
  - insightface__arcface         # no face recognition

# Scale extractors for your workload
overrides:
  intfloat__multilingual_e5_large_instruct:
    autoscaling_config:
      min_replicas: 2            # always warm for search
      max_replicas: 6            # burst for batch ingestion

# Size your worker groups
cluster:
  cpu-workers:
    minReplicas: 1
    maxReplicas: 4
  batch-workers:
    minReplicas: 0
    maxReplicas: 5               # more batch capacity
  gpu-workers:
    minReplicas: 0
    maxReplicas: 3
Changes take effect on the next deploy or manual apply. Your Mixpeek point of contact can walk you through the initial configuration.

Kubernetes Access

Enterprise tenants get operator-level access to their namespace:
ActionAccess
View pods, logs, eventsYes
Scale worker groupsYes
Restart stuck podsYes
Port-forward to Ray dashboardYes
View Ray cluster statusYes
Access secrets or RBACNo
Modify other namespacesNo

Quick scaling

For immediate scaling (e.g., before a large batch job):
# Scale batch workers to 3
kubectl -n <your-namespace> scale raycluster <your-cluster> \
  --replicas=3 --resource-name=batch-workers

# Scale GPU workers to 1 for video processing
kubectl -n <your-namespace> scale raycluster <your-cluster> \
  --replicas=1 --resource-name=gpu-workers

# Scale back down after the job
kubectl -n <your-namespace> scale raycluster <your-cluster> \
  --replicas=0 --resource-name=batch-workers
Manual scaling is temporary. The next deploy resets to the values in your overrides file.

Monitoring

Ray Dashboard

Port-forward to access your Ray dashboard locally:
kubectl -n <your-namespace> port-forward svc/<your-head-svc> 8265:8265
# Open http://localhost:8265
The dashboard shows active deployments, replica counts, request queues, worker resource usage, and cluster utilization.

Grafana

Each tenant gets scoped Grafana dashboards:
  • Queue Health — depth and age of your job queues
  • Batch Status — ingestion progress, success/failure rates
  • API Latency — p50/p95/p99 for your requests
  • Cost & Usage — compute hours, resource utilization
  • Error Rate — 5xx errors scoped to your tenant

kubectl

# Pod resource usage
kubectl -n <your-namespace> top pods

# Recent events (scheduling failures, OOM, probe failures)
kubectl -n <your-namespace> get events --sort-by=.lastTimestamp | tail -20

# Ray Serve status (which extractors are running)
kubectl -n <your-namespace> exec -it deploy/<your-head> -- serve status

Billing

Single-tenant billing has two components:
  1. Platform fee — fixed monthly fee for access to the Mixpeek platform, API, Studio, and support
  2. Compute passthrough — actual cloud infrastructure cost (GKE nodes, storage, networking) passed through at cost plus a management markup
There are no per-operation credit charges on the single-tenant plan. You pay for the underlying cloud resources your cluster consumes, and Mixpeek handles provisioning, monitoring, upgrades, and support.
Keep batch and GPU workers at minReplicas: 0 — they scale from zero on demand. You only pay for compute when it’s active. Disable unused extractors to reduce your always-on footprint.

Troubleshooting

Pods stuck in Pending

Check events for the pending pod:
kubectl -n <your-namespace> describe pod <pod-name>
Common causes:
  • Insufficient resources — cluster autoscaler is provisioning a new node (2-3 minutes)
  • GPU unavailable — L4 GPUs may be temporarily exhausted in the region
  • Resource limits — reduce maxReplicas on other worker groups to free capacity

OOMKilled pods

A pod exceeded its memory limit. Increase memory for the affected worker group in your overrides file:
cluster:
  cpu-workers:
    resources:
      limits:
        memory: "48Gi"

Extractor returning 503

The extractor has no running replicas (scaled to zero) or all replicas are saturated:
  • First request after idle takes 5-10 seconds for cold start
  • Set min_replicas: 1 for latency-sensitive extractors
  • Increase max_replicas if you’re seeing sustained 503s under load

Batch jobs stuck

  1. Check if batch workers are running: kubectl -n <your-namespace> get pods -l ray.io/group=batch-workers
  2. If no batch workers, manually scale up: kubectl -n <your-namespace> scale raycluster <your-cluster> --replicas=1 --resource-name=batch-workers
  3. Check queue depth in Grafana — a backlog is normal for large batches

Custom Code (Fork Deploys)

Single-tenant customers can fork the Mixpeek codebase and deploy custom code to their tenant:
  • Custom extractors — add domain-specific feature extraction logic
  • Modified inference — tune model parameters, swap models, add pre/post-processing
  • Engine changes — adjust batch processing, add custom endpoints
Your fork builds into a tenant-specific container image (tenant-<name>-<sha>) and deploys only to your namespace. The shared platform is unaffected.

Workflow

  1. Fork the Mixpeek repo
  2. Make your changes (extractors, inference, engine code)
  3. Trigger a tenant deploy via GitHub Actions — builds from your fork, deploys to your namespace
  4. Rebase on upstream periodically to pick up platform updates
Config-only changes (disabling extractors, adjusting scaling) don’t require a fork or image build — edit your overrides file and trigger a config deploy.

Getting Started

To provision a single-tenant data plane:
  1. Contact your Mixpeek account manager or email sales@mixpeek.com
  2. We provision your isolated infrastructure (database, compute, cache, storage)
  3. You receive kubectl access to your namespace and Grafana dashboards
  4. Your existing API keys are routed to your dedicated data plane — no code changes
Migration from the shared platform to single-tenant is seamless. Your data is copied to the isolated database, the tenant config is updated, and routing switches instantly. Rollback is equally fast.