Skip to main content
Mixpeek’s single-tenant deployment gives enterprise customers a fully isolated data plane: dedicated database, compute cluster, cache, object storage, and job queues. The shared control plane (API gateway, auth, billing) routes requests to your data plane transparently — your API keys and SDKs work the same way.

Architecture

Single-tenant architecture: shared control plane routing to isolated enterprise data plane and shared standard data plane

What’s isolated

ResourceIsolation LevelDetails
DatabaseDedicated databaseSeparate MongoDB instance or database per tenant
ComputeDedicated Ray clusterHead node + autoscaling worker pools (CPU, batch, GPU)
CacheDedicated RedisSeparate instance with independent memory and connection pools
StorageDedicated bucketSeparate GCS/S3 bucket per tenant
Job queuesDedicated queuesCelery queues prefixed per tenant — jobs never compete with other customers
Vector storeDedicated shardIsolated MVS shard with tenant-specific GCS-backed snapshots

What’s shared

The control plane is stateless — it routes requests but holds no customer data:
  • API gateway — resolves your API key to your data plane endpoints
  • Studio UI — connects to the API, holds no data
  • Auth and API key management
  • Billing and usage metering
  • Container image registry — same code, separate compute

Cloud & Region Deployment

Mixpeek’s single-tenant architecture supports deployment across cloud providers and regions. Each tenant’s data plane is self-contained — all customer data stays in the region you choose.

In-region co-location

Your data plane runs within your cloud provider and region. All traffic between your application and Mixpeek stays in-region — no cross-region or cross-cloud networking overhead. The only out-of-region hop is the initial API request through the control plane for auth and routing (~1 RTT, no customer data persisted).

Supported clouds and regions

CloudRegionLocationStatus
GCPus-east1South CarolinaAvailable
GCPeu-west1BelgiumOn-demand
AWSus-east-1N. VirginiaOn-demand
AWSeu-west-1IrelandOn-demand
AWSeu-west-3ParisOn-demand
AWSap-south-1MumbaiOn-demand
“On-demand” regions are provisioned when a customer commits. Lead time is approximately one week for the first tenant in a new region. Additional tenants in the same region deploy in hours. Need a region not listed? Contact us — we can deploy to any GCP or AWS region.

How it works

The control plane runs centrally and routes requests to your data plane via URL-based tenant configuration. Your engine_url, mongo_uri, redis_url, and storage_bucket all point to infrastructure in your chosen cloud and region.
┌─────────────────────────────┐
│   Control Plane (shared)     │
│   api.mixpeek.com            │
│   Auth · Routing · Billing   │
└──────┬──────────┬────────────┘
       │          │
  ┌────▼───┐  ┌──▼──────┐
  │ GCP    │  │ AWS     │
  │ GKE    │  │ EKS     │
  │ GCS    │  │ S3      │
  └────────┘  └─────────┘
When you onboard, you select a cloud provider and region. Mixpeek provisions your isolated data plane there — dedicated compute, storage, database, and cache. The control plane reaches your data plane over private networking (VPC peering or internal load balancers), never over the public internet.
Switching between cloud providers or regions after initial deployment requires a data migration. Choose your target cloud and region during onboarding.

Node & resource selection

Your tenant’s workloads can run on dedicated node pools with tenant-specific taints and labels. This gives you:
  • Hardware selection — choose machine types per worker group (CPU-optimized, memory-optimized, GPU)
  • Spot/preemptible nodes — reduce cost for batch-tolerant workloads
  • GPU acceleration — dedicated GPU nodes (NVIDIA L4, A100) for video processing and large model inference
  • Isolation guarantees — tenant taints ensure no other workloads land on your nodes
Node pool configuration is defined in your tenant overrides file:
node_pools:
  cpu-workers:
    machine_type: n2-highmem-8    # or r6i.2xlarge on AWS
    min_nodes: 1
    max_nodes: 5
    spot: true
  gpu-workers:
    machine_type: g2-standard-8   # or g5.2xlarge on AWS
    min_nodes: 0
    max_nodes: 4
    accelerator:
      type: nvidia-l4
      count: 1

Tenant Routing

Every API request goes through tenant resolution:
  1. Your API key authenticates against the shared auth layer
  2. The API resolves your organization to a tenant configuration
  3. The tenant config specifies your data plane endpoints (database, cache, compute, storage)
  4. The request executes entirely within your isolated infrastructure
Tenant routing is transparent. Your API keys, SDKs, and integrations work identically to the shared platform — no code changes required.

Compute Cluster

Your Ray cluster runs in a dedicated Kubernetes namespace with independent scaling.

Worker groups

GroupDefault RangeUse Case
CPU workers1–4 nodesText embeddings, reranking, classification, image embeddings
Batch workers0–30 nodesLarge ingestion jobs (scale from zero on demand)
GPU workers0–8 nodes (NVIDIA L4)Video processing, large model inference
Each group autoscales independently. Batch and GPU workers can default to zero replicas and scale up when jobs arrive — you only pay for compute when it’s active.

Extractor scaling

Individual extractors (embedding models, classifiers, etc.) scale independently within your cluster:
  • min_replicas — minimum always-running instances (0 = scale to zero when idle)
  • max_replicas — maximum instances under load
  • target_ongoing_requests — requests per replica before scaling up
  • downscale_delay_s — cooldown before scaling down (prevents flapping)
Set min_replicas: 1 for latency-sensitive extractors (e.g., your primary embedding model for search). Use min_replicas: 0 for batch-only extractors to save cost.

Disabling extractors

If you don’t use certain capabilities (e.g., audio embeddings, face recognition, web scraping), disable the corresponding extractors. This frees compute resources for the extractors you do use and reduces your always-on footprint.

Self-Service Configuration

Enterprise tenants manage their cluster configuration via a YAML overrides file. On each platform deploy, Mixpeek merges your overrides with the latest extractor registry — new extractors appear automatically, disabled extractors stay disabled.

What you can configure

SectionControls
auto_deployWhen true, platform updates on main auto-deploy to your cluster
disabledList of extractors to exclude from your cluster
overridesPer-extractor scaling (min/max replicas, resources, concurrency)
clusterWorker group sizing (replicas, min/max nodes)
headHead node resources (CPU, memory)
celeryBatch and general worker pool sizing, concurrency, queue bindings
redisCache memory limits, persistence policy
mvsVector store shard config (WAL, snapshots, index parameters)
node_poolsDedicated node pool machine types, autoscaling ranges, GPU config
envEnvironment variable overrides
specHealth check thresholds

Example overrides

# Auto-deploy platform updates (set false if running a fork)
auto_deploy: true

# Disable extractors you don't need
disabled:
  - mixpeek__playwright          # no web scraping
  - laion__clap_htsat_tiny       # no audio embeddings
  - insightface__arcface         # no face recognition

# Scale extractors for your workload
overrides:
  intfloat__multilingual_e5_large_instruct:
    autoscaling_config:
      min_replicas: 2            # always warm for search
      max_replicas: 6            # burst for batch ingestion

# Size your worker groups
cluster:
  cpu-workers:
    minReplicas: 1
    maxReplicas: 4
  batch-workers:
    minReplicas: 0
    maxReplicas: 5               # more batch capacity
  gpu-workers:
    minReplicas: 0
    maxReplicas: 3

# Celery worker pools
celery:
  batch:
    replicas: 3
    concurrency: 2
    autoscaling:
      minReplicas: 3
      maxReplicas: 6
  general:
    replicas: 1
    concurrency: 4
Changes take effect on the next deploy. When auto_deploy: true, every push to main automatically rebuilds and deploys to your cluster.

Kubernetes Access

Enterprise tenants get operator-level access to their namespace:
ActionAccess
View pods, logs, eventsYes
Scale worker groupsYes
Restart stuck podsYes
Port-forward to Ray dashboardYes
View Ray cluster statusYes
Access secrets or RBACNo
Modify other namespacesNo

Quick scaling

For immediate scaling (e.g., before a large batch job):
# Scale batch workers to 3
kubectl -n <your-namespace> scale raycluster <your-cluster> \
  --replicas=3 --resource-name=batch-workers

# Scale GPU workers to 1 for video processing
kubectl -n <your-namespace> scale raycluster <your-cluster> \
  --replicas=1 --resource-name=gpu-workers

# Scale back down after the job
kubectl -n <your-namespace> scale raycluster <your-cluster> \
  --replicas=0 --resource-name=batch-workers
Manual scaling is temporary. The next deploy resets to the values in your overrides file.

Monitoring

Ray Dashboard

Port-forward to access your Ray dashboard locally:
kubectl -n <your-namespace> port-forward svc/<your-head-svc> 8265:8265
# Open http://localhost:8265
The dashboard shows active deployments, replica counts, request queues, worker resource usage, and cluster utilization.

Grafana

Each tenant gets scoped Grafana dashboards:
  • Queue Health — depth and age of your job queues
  • Batch Status — ingestion progress, success/failure rates
  • API Latency — p50/p95/p99 for your requests
  • Cost & Usage — compute hours, resource utilization
  • Error Rate — 5xx errors scoped to your tenant

kubectl

# Pod resource usage
kubectl -n <your-namespace> top pods

# Recent events (scheduling failures, OOM, probe failures)
kubectl -n <your-namespace> get events --sort-by=.lastTimestamp | tail -20

# Ray Serve status (which extractors are running)
kubectl -n <your-namespace> exec -it deploy/<your-head> -- serve status

Billing

Single-tenant billing has two components:
  1. Platform fee — fixed monthly fee for access to the Mixpeek platform, API, Studio, and support
  2. Compute passthrough — actual cloud infrastructure cost (nodes, storage, networking) passed through at cost plus a management markup
There are no per-operation credit charges on the single-tenant plan. You pay for the underlying cloud resources your cluster consumes, and Mixpeek handles provisioning, monitoring, upgrades, and support.
Keep batch and GPU workers at minReplicas: 0 — they scale from zero on demand. You only pay for compute when it’s active. Disable unused extractors to reduce your always-on footprint.

Troubleshooting

Pods stuck in Pending

Check events for the pending pod:
kubectl -n <your-namespace> describe pod <pod-name>
Common causes:
  • Insufficient resources — cluster autoscaler is provisioning a new node (2-3 minutes)
  • GPU unavailable — GPUs may be temporarily exhausted in the region
  • Resource limits — reduce maxReplicas on other worker groups to free capacity

OOMKilled pods

A pod exceeded its memory limit. Increase memory for the affected worker group in your overrides file:
cluster:
  cpu-workers:
    resources:
      limits:
        memory: "48Gi"

Extractor returning 503

The extractor has no running replicas (scaled to zero) or all replicas are saturated:
  • First request after idle takes 5-10 seconds for cold start
  • Set min_replicas: 1 for latency-sensitive extractors
  • Increase max_replicas if you’re seeing sustained 503s under load

Batch jobs stuck

  1. Check if batch workers are running: kubectl -n <your-namespace> get pods -l ray.io/group=batch-workers
  2. If no batch workers, manually scale up: kubectl -n <your-namespace> scale raycluster <your-cluster> --replicas=1 --resource-name=batch-workers
  3. Check queue depth in Grafana — a backlog is normal for large batches

Custom Code (Fork Deploys)

Single-tenant customers can fork the Mixpeek codebase and deploy custom code to their tenant:
  • Custom extractors — add domain-specific feature extraction logic
  • Modified inference — tune model parameters, swap models, add pre/post-processing
  • Engine changes — adjust batch processing, add custom endpoints
Your fork builds into a tenant-specific container image and deploys only to your namespace. The shared platform is unaffected.

Workflow

  1. Fork the Mixpeek repo
  2. Make your changes (extractors, inference, engine code)
  3. Trigger a tenant deploy via GitHub Actions — builds from your fork, deploys to your namespace
  4. Rebase on upstream periodically to pick up platform updates
Config-only changes (disabling extractors, adjusting scaling) don’t require a fork or image build — edit your overrides file and trigger a deploy.

Getting Started

To provision a single-tenant data plane:
  1. Contact your Mixpeek account manager or email sales@mixpeek.com
  2. Choose your cloud provider (GCP or AWS) and target region
  3. We provision your isolated infrastructure (database, compute, cache, storage)
  4. You receive kubectl access to your namespace and Grafana dashboards
  5. Your existing API keys are routed to your dedicated data plane — no code changes
Migration from the shared platform to single-tenant is seamless. Your data is copied to the isolated database, the tenant config is updated, and routing switches instantly. Rollback is equally fast.