> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Model Registry

> Load HuggingFace models, built-in models, or your own fine-tuned weights inside custom extractors

Models run inside [custom extractors](/processing/custom-extractors). The model registry handles downloading, caching, and serving — you just declare which model to use and the infrastructure shares it across all workers via Ray's object store.

<Info>
  **How models are consumed.** Built-in and HuggingFace models work on any plan — reference them by feature URI or load them in a pipeline. **Custom (uploaded) models** are consumed from inside a custom extractor (`model_source="namespace"`), so wiring one into ingest/retrieval also requires a [dedicated deployment](/processing/custom-extractors#availability) (or an extractor merged via [Submissions](/processing/extractor-marketplace)). Uploading + deploying a custom model is supported wherever your org has Enterprise infra.
</Info>

## Three Ways to Load Models

| Approach                       | When to Use                                                                                          |
| ------------------------------ | ---------------------------------------------------------------------------------------------------- |
| **Built-in Models**            | Common tasks — embeddings, transcription, reranking. No code needed, just reference the feature URI. |
| **HuggingFace Models**         | Any public HF model. Cached cluster-wide on first download.                                          |
| **Custom Models** (Enterprise) | Your own fine-tuned weights uploaded as `.tar.gz`. Stored in S3, deployed to Ray.                    |

## HuggingFace Models (Recommended)

Use `LazyModelMixin` in your extractor's pipeline. Models load on first batch, not at actor creation, and are shared zero-copy across all workers.

```python theme={null}
from engine.models.lazy import LazyModelMixin
from engine.inference.services import BaseBatchInferenceService

class MyEmbeddingProcessor(LazyModelMixin, BaseBatchInferenceService):
    model_id = "intfloat/multilingual-e5-large-instruct"
    model_class = "AutoModel"
    tokenizer_class = "AutoTokenizer"
    torch_dtype = "float16"
    model_source = "huggingface"

    def _process_batch(self, batch):
        model, tokenizer = self.get_model()

        inputs = tokenizer(
            batch["text"].tolist(),
            padding=True,
            truncation=True,
            return_tensors="pt",
        )

        with torch.no_grad():
            outputs = model(**inputs)

        batch["embedding"] = outputs.last_hidden_state.mean(dim=1).tolist()
        return batch
```

### LazyModelMixin Attributes

| Attribute         | Type        | Default           | Description                                |
| ----------------- | ----------- | ----------------- | ------------------------------------------ |
| `model_id`        | str         | `""`              | HuggingFace model ID or namespace model ID |
| `model_class`     | str         | `"AutoModel"`     | Transformers model class name              |
| `tokenizer_class` | str \| None | `"AutoTokenizer"` | Tokenizer class, or `None` to skip         |
| `torch_dtype`     | str         | `"float32"`       | `"float16"`, `"float32"`, or `"bfloat16"`  |
| `model_source`    | str         | `"huggingface"`   | `"huggingface"` or `"namespace"`           |

Call `self.get_model()` to get a `(model, tokenizer)` tuple. Override `_instantiate_model(cached_data)` for non-standard architectures.

## Custom Models (Enterprise)

<Note>
  Custom models require an **Enterprise** subscription. [Contact sales](https://mixpeek.com/contact) to enable.
</Note>

Upload fine-tuned weights and use them in extractors. Three steps:

<Steps>
  <Step title="Upload">
    ```bash theme={null}
    tar -czvf my_model.tar.gz ./model_weights/

    curl -X POST "$MIXPEEK_API_URL/v1/namespaces/$MIXPEEK_NAMESPACE/models" \
      -H "Authorization: Bearer $MIXPEEK_API_KEY" \
      -F "file=@my_model.tar.gz" \
      -F "name=my-embedding-model" \
      -F "version=1.0.0" \
      -F "model_format=pytorch" \
      -F "task_type=embedding" \
      -F "num_gpus=0" \
      -F "memory_gb=4.0"
    ```

    Supported formats: `pytorch` (`.pt`, `.pth`), `safetensors`, `onnx`, `huggingface` (directory).
  </Step>

  <Step title="Deploy to Ray">
    ```bash theme={null}
    curl -X POST "$MIXPEEK_API_URL/v1/namespaces/$MIXPEEK_NAMESPACE/models/my-embedding-model_1_0_0/deploy" \
      -H "Authorization: Bearer $MIXPEEK_API_KEY"
    ```
  </Step>

  <Step title="Use in your extractor">
    Set `model_source = "namespace"` and override `_instantiate_model()`:

    ```python theme={null}
    class MyCustomProcessor(LazyModelMixin, BaseBatchInferenceService):
        model_id = "my-embedding-model_1_0_0"
        model_source = "namespace"

        def _instantiate_model(self, weights):
            import torch
            model = torch.nn.Linear(768, 256)
            model.load_state_dict(weights)
            model.to(self._detect_device())
            model.eval()
            return model, None

        def _process_batch(self, batch):
            model, _ = self.get_model()
            # Use model...
    ```
  </Step>
</Steps>

## Model Versioning

Models are versioned independently. Deploy a new version alongside the existing one, test in staging, then shift traffic:

```
my-embedding-model_1_0_0  (production)
my-embedding-model_2_0_0  (staging — validate before promoting)
```

The feature URI updates with the extractor version (`mixpeek://my_extractor@2.0.0/my_embedding`), so both versions can coexist.

## Python SDK

```python theme={null}
from mixpeek import Mixpeek
from mixpeek.api.custom_models_api import CustomModelsApi

client = Mixpeek(api_key="mxp_sk_...")
models = CustomModelsApi(client.api_client)

result = models.upload_model_namespaces(
    namespace_id="ns_abc123",
    file="my_model.tar.gz",   # path to the .tar.gz archive
    name="my-reranker",
    version="1.0.0",
    model_format="pytorch",
    task_type="reranking",
)

models.deploy_model_namespaces(
    namespace_id="ns_abc123",
    model_id=result.model_id,
)
```

<Note>
  The generated SDK exposes these under `CustomModelsApi` (`upload_model_namespaces`, `deploy_model_namespaces`, `list_models_namespaces`, …). See the [Models API reference](/api-reference/custom-models/upload-a-custom-model) for exact parameters. The cURL form above is verified end-to-end.
</Note>

## Limits

| Limit                    | Value                                   |
| ------------------------ | --------------------------------------- |
| Max models per namespace | 50                                      |
| Max archive size         | 10 GB                                   |
| Supported formats        | pytorch, safetensors, onnx, huggingface |

## Related

<CardGroup cols={2}>
  <Card title="Custom Extractors" icon="plug" href="/processing/custom-extractors">
    Package and deploy extractors that use these models.
  </Card>

  <Card title="Extractor Quickstart" icon="rocket" href="/tutorials/custom-extractor-quickstart">
    Build a working extractor with model loading end-to-end.
  </Card>

  <Card title="Model API Reference" icon="cloud-arrow-up" href="/api-reference/custom-models/upload-a-custom-model">
    Upload, deploy, list, and delete model archives.
  </Card>

  <Card title="Self-Improving CV Pipeline" icon="eye" href="/tutorials/annotations-improve-features">
    Full tutorial: deploy YOLO, annotate, fine-tune, redeploy.
  </Card>
</CardGroup>
