> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Single Tenant

> Dedicated infrastructure with isolated compute, storage, and data — deploy on any cloud, in any region

Mixpeek's single-tenant deployment gives enterprise customers a fully isolated data plane: dedicated database, compute cluster, cache, object storage, and job queues. The shared control plane (API gateway, auth, billing) routes requests to your data plane transparently — your API keys and SDKs work the same way.

## Architecture

<Frame>
  <img src="https://mintcdn.com/mixpeek/CEJla4pymd_l6V28/assets/mixpeek-single-tenant.svg?fit=max&auto=format&n=CEJla4pymd_l6V28&q=85&s=3afcd79606a676228d44607fd19b761e" alt="Single-tenant architecture: shared control plane routing to isolated enterprise data plane and shared standard data plane" width="900" height="520" data-path="assets/mixpeek-single-tenant.svg" />
</Frame>

### What's isolated

| Resource         | Isolation Level       | Details                                                                     |
| ---------------- | --------------------- | --------------------------------------------------------------------------- |
| **Database**     | Dedicated database    | Separate MongoDB instance or database per tenant                            |
| **Compute**      | Dedicated Ray cluster | Head node + autoscaling worker pools (CPU, batch, GPU)                      |
| **Cache**        | Dedicated Redis       | Separate instance with independent memory and connection pools              |
| **Storage**      | Dedicated bucket      | Separate GCS/S3 bucket per tenant                                           |
| **Job queues**   | Dedicated queues      | Celery queues prefixed per tenant — jobs never compete with other customers |
| **Vector store** | Dedicated shard       | Isolated MVS shard with tenant-specific GCS-backed snapshots                |

### What's shared

The control plane is stateless — it routes requests but holds no customer data:

* **API gateway** — resolves your API key to your data plane endpoints
* **Studio UI** — connects to the API, holds no data
* **Auth and API key management**
* **Billing and usage metering**
* **Container image registry** — same code, separate compute

## Cloud & Region Deployment

Mixpeek's single-tenant architecture supports deployment across cloud providers and regions. Each tenant's data plane is self-contained — all customer data stays in the region you choose.

### In-region co-location

Your data plane runs **within your cloud provider and region**. All traffic between your application and Mixpeek stays in-region — no cross-region or cross-cloud networking overhead. The only out-of-region hop is the initial API request through the control plane for auth and routing (\~1 RTT, no customer data persisted).

### Supported clouds and regions

| Cloud   | Region       | Location       | Status    |
| ------- | ------------ | -------------- | --------- |
| **GCP** | `us-east1`   | South Carolina | Available |
| **GCP** | `eu-west1`   | Belgium        | On-demand |
| **AWS** | `us-east-1`  | N. Virginia    | On-demand |
| **AWS** | `eu-west-1`  | Ireland        | On-demand |
| **AWS** | `eu-west-3`  | Paris          | On-demand |
| **AWS** | `ap-south-1` | Mumbai         | On-demand |

"On-demand" regions are provisioned when a customer commits. Lead time is approximately one week for the first tenant in a new region. Additional tenants in the same region deploy in hours.

Need a region not listed? Contact us — we can deploy to any GCP or AWS region.

### How it works

The control plane runs centrally and routes requests to your data plane via URL-based tenant configuration. Your `engine_url`, `mongo_uri`, `redis_url`, and `storage_bucket` all point to infrastructure in your chosen cloud and region.

```
┌─────────────────────────────┐
│   Control Plane (shared)     │
│   api.mixpeek.com            │
│   Auth · Routing · Billing   │
└──────┬──────────┬────────────┘
       │          │
  ┌────▼───┐  ┌──▼──────┐
  │ GCP    │  │ AWS     │
  │ GKE    │  │ EKS     │
  │ GCS    │  │ S3      │
  └────────┘  └─────────┘
```

When you onboard, you select a cloud provider and region. Mixpeek provisions your isolated data plane there — dedicated compute, storage, database, and cache. The control plane reaches your data plane over private networking (VPC peering or internal load balancers), never over the public internet.

<Note>
  Switching between cloud providers or regions after initial deployment requires a data migration. Choose your target cloud and region during onboarding.
</Note>

### Node & resource selection

Your tenant's workloads can run on dedicated node pools with tenant-specific taints and labels. This gives you:

* **Hardware selection** — choose machine types per worker group (CPU-optimized, memory-optimized, GPU)
* **Spot/preemptible nodes** — reduce cost for batch-tolerant workloads
* **GPU acceleration** — dedicated GPU nodes (NVIDIA L4, A100) for video processing and large model inference
* **Isolation guarantees** — tenant taints ensure no other workloads land on your nodes

Node pool configuration is defined in your tenant overrides file:

```yaml theme={null}
node_pools:
  cpu-workers:
    machine_type: n2-highmem-8    # or r6i.2xlarge on AWS
    min_nodes: 1
    max_nodes: 5
    spot: true
  gpu-workers:
    machine_type: g2-standard-8   # or g5.2xlarge on AWS
    min_nodes: 0
    max_nodes: 4
    accelerator:
      type: nvidia-l4
      count: 1
```

## Tenant Routing

Every API request goes through tenant resolution:

1. Your API key authenticates against the shared auth layer
2. The API resolves your organization to a tenant configuration
3. The tenant config specifies your data plane endpoints (database, cache, compute, storage)
4. The request executes entirely within your isolated infrastructure

<Note>
  Tenant routing is transparent. Your API keys, SDKs, and integrations work identically to the shared platform — no code changes required.
</Note>

## Compute Cluster

Your Ray cluster runs in a dedicated Kubernetes namespace with independent scaling.

### Worker groups

| Group             | Default Range         | Use Case                                                     |
| ----------------- | --------------------- | ------------------------------------------------------------ |
| **CPU workers**   | 1–4 nodes             | Text embeddings, reranking, classification, image embeddings |
| **Batch workers** | 0–30 nodes            | Large ingestion jobs (scale from zero on demand)             |
| **GPU workers**   | 0–8 nodes (NVIDIA L4) | Video processing, large model inference                      |

Each group autoscales independently. Batch and GPU workers can default to zero replicas and scale up when jobs arrive — you only pay for compute when it's active.

### Extractor scaling

Individual extractors (embedding models, classifiers, etc.) scale independently within your cluster:

* **`min_replicas`** — minimum always-running instances (0 = scale to zero when idle)
* **`max_replicas`** — maximum instances under load
* **`target_ongoing_requests`** — requests per replica before scaling up
* **`downscale_delay_s`** — cooldown before scaling down (prevents flapping)

<Tip>
  Set `min_replicas: 1` for latency-sensitive extractors (e.g., your primary embedding model for search). Use `min_replicas: 0` for batch-only extractors to save cost.
</Tip>

### Disabling extractors

If you don't use certain capabilities (e.g., audio embeddings, face recognition, web scraping), disable the corresponding extractors. This frees compute resources for the extractors you do use and reduces your always-on footprint.

## Self-Service Configuration

Enterprise tenants manage their cluster configuration via a YAML overrides file. On each platform deploy, Mixpeek merges your overrides with the latest extractor registry — new extractors appear automatically, disabled extractors stay disabled.

### What you can configure

| Section           | Controls                                                          |
| ----------------- | ----------------------------------------------------------------- |
| **`auto_deploy`** | When `true`, platform updates on main auto-deploy to your cluster |
| **`disabled`**    | List of extractors to exclude from your cluster                   |
| **`overrides`**   | Per-extractor scaling (min/max replicas, resources, concurrency)  |
| **`cluster`**     | Worker group sizing (replicas, min/max nodes)                     |
| **`head`**        | Head node resources (CPU, memory)                                 |
| **`celery`**      | Batch and general worker pool sizing, concurrency, queue bindings |
| **`redis`**       | Cache memory limits, persistence policy                           |
| **`mvs`**         | Vector store shard config (WAL, snapshots, index parameters)      |
| **`node_pools`**  | Dedicated node pool machine types, autoscaling ranges, GPU config |
| **`env`**         | Environment variable overrides                                    |
| **`spec`**        | Health check thresholds                                           |

### Example overrides

```yaml theme={null}
# Auto-deploy platform updates (set false if running a fork)
auto_deploy: true

# Disable extractors you don't need
disabled:
  - mixpeek__playwright          # no web scraping
  - laion__clap_htsat_tiny       # no audio embeddings
  - insightface__arcface         # no face recognition

# Scale extractors for your workload
overrides:
  intfloat__multilingual_e5_large_instruct:
    autoscaling_config:
      min_replicas: 2            # always warm for search
      max_replicas: 6            # burst for batch ingestion

# Size your worker groups
cluster:
  cpu-workers:
    minReplicas: 1
    maxReplicas: 4
  batch-workers:
    minReplicas: 0
    maxReplicas: 5               # more batch capacity
  gpu-workers:
    minReplicas: 0
    maxReplicas: 3

# Celery worker pools
celery:
  batch:
    replicas: 3
    concurrency: 2
    autoscaling:
      minReplicas: 3
      maxReplicas: 6
  general:
    replicas: 1
    concurrency: 4
```

Changes take effect on the next deploy. When `auto_deploy: true`, every push to main automatically rebuilds and deploys to your cluster.

## Kubernetes Access

Enterprise tenants get operator-level access to their namespace:

| Action                        | Access |
| ----------------------------- | ------ |
| View pods, logs, events       | Yes    |
| Scale worker groups           | Yes    |
| Restart stuck pods            | Yes    |
| Port-forward to Ray dashboard | Yes    |
| View Ray cluster status       | Yes    |
| Access secrets or RBAC        | No     |
| Modify other namespaces       | No     |

### Quick scaling

For immediate scaling (e.g., before a large batch job):

```bash theme={null}
# Scale batch workers to 3
kubectl -n <your-namespace> scale raycluster <your-cluster> \
  --replicas=3 --resource-name=batch-workers

# Scale GPU workers to 1 for video processing
kubectl -n <your-namespace> scale raycluster <your-cluster> \
  --replicas=1 --resource-name=gpu-workers

# Scale back down after the job
kubectl -n <your-namespace> scale raycluster <your-cluster> \
  --replicas=0 --resource-name=batch-workers
```

<Note>
  Manual scaling is temporary. The next deploy resets to the values in your overrides file.
</Note>

## Monitoring

### Ray Dashboard

Port-forward to access your Ray dashboard locally:

```bash theme={null}
kubectl -n <your-namespace> port-forward svc/<your-head-svc> 8265:8265
# Open http://localhost:8265
```

The dashboard shows active deployments, replica counts, request queues, worker resource usage, and cluster utilization.

### Grafana

Each tenant gets scoped Grafana dashboards:

* **Queue Health** — depth and age of your job queues
* **Batch Status** — ingestion progress, success/failure rates
* **API Latency** — p50/p95/p99 for your requests
* **Cost & Usage** — compute hours, resource utilization
* **Error Rate** — 5xx errors scoped to your tenant

### kubectl

```bash theme={null}
# Pod resource usage
kubectl -n <your-namespace> top pods

# Recent events (scheduling failures, OOM, probe failures)
kubectl -n <your-namespace> get events --sort-by=.lastTimestamp | tail -20

# Ray Serve status (which extractors are running)
kubectl -n <your-namespace> exec -it deploy/<your-head> -- serve status
```

## Billing

Single-tenant billing has two components:

1. **Platform fee** — fixed monthly fee for access to the Mixpeek platform, API, Studio, and support
2. **Compute passthrough** — actual cloud infrastructure cost (nodes, storage, networking) passed through at cost plus a management markup

There are no per-operation credit charges on the single-tenant plan. You pay for the underlying cloud resources your cluster consumes, and Mixpeek handles provisioning, monitoring, upgrades, and support.

<Tip>
  Keep batch and GPU workers at `minReplicas: 0` — they scale from zero on demand. You only pay for compute when it's active. Disable unused extractors to reduce your always-on footprint.
</Tip>

## Troubleshooting

### Pods stuck in Pending

Check events for the pending pod:

```bash theme={null}
kubectl -n <your-namespace> describe pod <pod-name>
```

Common causes:

* **Insufficient resources** — cluster autoscaler is provisioning a new node (2-3 minutes)
* **GPU unavailable** — GPUs may be temporarily exhausted in the region
* **Resource limits** — reduce `maxReplicas` on other worker groups to free capacity

### OOMKilled pods

A pod exceeded its memory limit. Increase memory for the affected worker group in your overrides file:

```yaml theme={null}
cluster:
  cpu-workers:
    resources:
      limits:
        memory: "48Gi"
```

### Extractor returning 503

The extractor has no running replicas (scaled to zero) or all replicas are saturated:

* First request after idle takes 5-10 seconds for cold start
* Set `min_replicas: 1` for latency-sensitive extractors
* Increase `max_replicas` if you're seeing sustained 503s under load

### Batch jobs stuck

1. Check if batch workers are running: `kubectl -n <your-namespace> get pods -l ray.io/group=batch-workers`
2. If no batch workers, manually scale up: `kubectl -n <your-namespace> scale raycluster <your-cluster> --replicas=1 --resource-name=batch-workers`
3. Check queue depth in Grafana — a backlog is normal for large batches

## Custom Code (Fork Deploys)

Single-tenant customers can fork the Mixpeek codebase and deploy custom code to their tenant:

* **Custom extractors** — add domain-specific feature extraction logic
* **Modified inference** — tune model parameters, swap models, add pre/post-processing
* **Engine changes** — adjust batch processing, add custom endpoints

Your fork builds into a tenant-specific container image and deploys only to your namespace. The shared platform is unaffected.

### Workflow

1. Fork the Mixpeek repo
2. Make your changes (extractors, inference, engine code)
3. Trigger a tenant deploy via GitHub Actions — builds from your fork, deploys to your namespace
4. Rebase on upstream periodically to pick up platform updates

<Note>
  Config-only changes (disabling extractors, adjusting scaling) don't require a fork or image build — edit your overrides file and trigger a deploy.
</Note>

## Getting Started

To provision a single-tenant data plane:

1. Contact your Mixpeek account manager or email [sales@mixpeek.com](mailto:sales@mixpeek.com)
2. Choose your cloud provider (GCP or AWS) and target region
3. We provision your isolated infrastructure (database, compute, cache, storage)
4. You receive kubectl access to your namespace and Grafana dashboards
5. Your existing API keys are routed to your dedicated data plane — no code changes

<Info>
  Migration from the shared platform to single-tenant is seamless. Your data is copied to the isolated database, the tenant config is updated, and routing switches instantly. Rollback is equally fast.
</Info>