> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Self-Improving CV Pipeline

> Deploy a YOLO model, annotate detections, fine-tune from corrections, and compound accuracy over time using annotations, taxonomies, and clusters

<Tip>This tutorial builds a computer vision pipeline that gets smarter with use. You'll deploy YOLO as a custom extractor, review detections with annotations, export corrections as training data, and close the loop by uploading improved weights — all through Mixpeek primitives.</Tip>

<Frame>
  <img src="https://mintcdn.com/mixpeek/HQEhVpg-9tm3E9k-/assets/tutorials/self-improving-cv-pipeline.svg?fit=max&auto=format&n=HQEhVpg-9tm3E9k-&q=85&s=dbf12b0a569e26ee99a30b70b0f6dd6b" alt="Self-improving CV pipeline: YOLO extractor → human review → fine-tune → redeploy, with taxonomies and clusters feeding back into the loop" width="1000" height="420" data-path="assets/tutorials/self-improving-cv-pipeline.svg" />
</Frame>

## What You'll Build

A closed-loop object detection system that compounds accuracy over time:

<Steps>
  <Step title="Detect">
    Deploy YOLO as a custom extractor. Every image ingested produces bounding boxes, class labels, and detection embeddings.
  </Step>

  <Step title="Review">
    Surface low-confidence detections for human review. Annotate each detection as confirmed, corrected, false positive, or missed.
  </Step>

  <Step title="Fine-Tune">
    Export annotations as YOLO-format training data. Fine-tune externally and upload improved weights as a new extractor version.
  </Step>

  <Step title="Compound">
    Taxonomies auto-classify future detections against your curated ground truth. Clusters discover new categories. Retroactive reapplication improves old data.
  </Step>
</Steps>

**Prerequisites:** A Mixpeek namespace with an API key. Familiarity with [custom extractors](/processing/custom-extractors) and the [model registry](/processing/model-registry) helps but isn't required.

***

## 1. Deploy YOLO as a Custom Extractor

Package a YOLO-based detector as a custom extractor. The extractor reads images, runs inference, and outputs detection features — bounding boxes, class labels, and confidence scores.

<AccordionGroup>
  <Accordion title="manifest.py — extractor metadata and feature definitions">
    ```python theme={null}
    feature_extractor_name = "yolo_detector"
    version = "1.0.0"
    description = "YOLOv8 object detection with bounding boxes and class embeddings"

    dependencies = ["ultralytics==8.2.0", "torch>=2.0"]

    features = [
        {
            "feature_type": "json",
            "feature_name": "detections",
        },
        {
            "feature_type": "embedding",
            "feature_name": "detection_embedding",
            "embedding_dim": 512,
            "distance_metric": "cosine",
        },
    ]

    output_schema = {
        "detections": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "class": {"type": "string"},
                    "confidence": {"type": "number"},
                    "bbox": {
                        "type": "object",
                        "properties": {
                            "x": {"type": "number"},
                            "y": {"type": "number"},
                            "w": {"type": "number"},
                            "h": {"type": "number"},
                        },
                    },
                },
            },
        },
        "detection_embedding": {
            "type": "array",
            "items": {"type": "number"},
            "description": "512-dim CLIP embedding of the highest-confidence crop",
        },
    }

    input_mappings = {"image": "image"}
    tier = 1
    tier_label = "OBJECT_DETECTION"
    compute_profile = {"resource_type": "gpu"}
    ```

    <Warning>
      Use the exact key names: `feature_type`, `feature_name`, `embedding_dim`, `distance_metric`. Using `name`/`type`/`dimensions`/`distance` will silently create zero vector indexes.
    </Warning>
  </Accordion>

  <Accordion title="pipeline.py — YOLO inference with LazyModelMixin">
    ```python theme={null}
    import numpy as np
    import pandas as pd
    from engine.models.lazy import LazyModelMixin
    from engine.inference.services import BaseBatchInferenceService
    from engine.io import parallel_io


    class YOLODetector(LazyModelMixin, BaseBatchInferenceService):
        model_id = "ultralytics/yolov8m"
        model_source = "huggingface"

        def _instantiate_model(self, cached_data):
            from ultralytics import YOLO
            model = YOLO("yolov8m.pt")
            model.to(self._detect_device())
            return model, None

        def _process_batch(self, batch):
            model, _ = self.get_model()

            images = parallel_io(batch["data"].tolist())

            results = model(images, conf=0.25)

            all_detections = []
            all_embeddings = []

            for result in results:
                detections = []
                for box in result.boxes:
                    detections.append({
                        "class": result.names[int(box.cls)],
                        "confidence": float(box.conf),
                        "bbox": {
                            "x": float(box.xywh[0][0]),
                            "y": float(box.xywh[0][1]),
                            "w": float(box.xywh[0][2]),
                            "h": float(box.xywh[0][3]),
                        },
                    })
                all_detections.append(detections)

                if detections:
                    best = max(detections, key=lambda d: d["confidence"])
                    crop = result.orig_img[
                        int(best["bbox"]["y"] - best["bbox"]["h"]/2):int(best["bbox"]["y"] + best["bbox"]["h"]/2),
                        int(best["bbox"]["x"] - best["bbox"]["w"]/2):int(best["bbox"]["x"] + best["bbox"]["w"]/2),
                    ]
                    embedding = self._embed_crop(crop)
                else:
                    embedding = np.zeros(512).tolist()
                all_embeddings.append(embedding)

            batch["detections"] = all_detections
            batch["detection_embedding"] = all_embeddings
            return batch

        def _embed_crop(self, crop):
            # Replace with CLIP or similar for production
            return np.random.randn(512).astype(np.float32).tolist()


    def build_steps(extractor_request=None, base_steps=None, **kwargs):
        steps = list(base_steps or [])
        steps.append(YOLODetector())
        return {"steps": steps, "prepare": lambda ds: ds}


    def extract(extractor_request=None, base_steps=None, **kwargs):
        result = build_steps(
            extractor_request=extractor_request,
            base_steps=base_steps, **kwargs
        )
        class PipelineResult:
            def __init__(self, steps, prepare):
                self.steps = steps
                self.prepare = prepare
        return PipelineResult(result["steps"], result["prepare"])
    ```
  </Accordion>
</AccordionGroup>

### Upload and Deploy

```bash theme={null}
# Package
zip -r yolo_detector.zip yolo_detector/

# Upload
UPLOAD=$(curl -s -X POST "$MP_API_URL/v1/namespaces/$NS_ID/plugins/uploads" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "yolo_detector", "version": "1.0.0", "file_size_bytes": 50000}')

UPLOAD_ID=$(echo $UPLOAD | jq -r '.upload_id')
PRESIGNED_URL=$(echo $UPLOAD | jq -r '.presigned_url')

curl -s -X PUT "$PRESIGNED_URL" \
  -H "Content-Type: application/zip" \
  --data-binary @yolo_detector.zip

curl -s -X POST "$MP_API_URL/v1/namespaces/$NS_ID/plugins/uploads/$UPLOAD_ID/confirm" \
  -H "Authorization: Bearer $MP_API_KEY"

# Deploy
curl -s -X POST "$MP_API_URL/v1/namespaces/$NS_ID/plugins/yolo_detector_1_0_0/deploy?deployment_type=batch_only" \
  -H "Authorization: Bearer $MP_API_KEY"
```

<Info>
  Your extractor is now available at feature URI `mixpeek://yolo_detector@1.0.0/detection_embedding`. This URI is the stable contract — retrievers, taxonomies, and clusters all reference it, so you can swap model versions without breaking downstream consumers.
</Info>

***

## 2. Create a Collection and Ingest Images

Bind the YOLO extractor to a bucket so every uploaded image gets processed automatically.

<CodeGroup>
  ```bash cURL theme={null}
  # Create bucket
  curl -s -X POST "$MP_API_URL/v1/namespaces/$NS_ID/buckets" \
    -H "Authorization: Bearer $MP_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "bucket_name": "security_footage",
      "bucket_schema": {
        "properties": {
          "image": {"type": "image", "required": true},
          "camera_id": {"type": "text"},
          "timestamp": {"type": "text"}
        }
      }
    }'

  # Create collection with YOLO extractor
  curl -s -X POST "$MP_API_URL/v1/namespaces/$NS_ID/collections" \
    -H "Authorization: Bearer $MP_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "collection_name": "detected_objects",
      "feature_extractor": {
        "feature_extractor_name": "yolo_detector",
        "version": "1.0.0"
      },
      "source": {
        "type": "bucket",
        "bucket_ids": ["bkt_security_footage"]
      }
    }'

  # Upload images
  curl -s -X POST "$MP_API_URL/v1/namespaces/$NS_ID/buckets/$BUCKET_ID/objects" \
    -H "Authorization: Bearer $MP_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "metadata": {"camera_id": "cam_lobby_01", "timestamp": "2026-05-03T14:30:00Z"},
      "blobs": [{
        "property": "image",
        "type": "image",
        "data": {"url": "s3://my-bucket/footage/frame_001.jpg"}
      }]
    }'
  ```

  ```python Python SDK theme={null}
  from mixpeek import Mixpeek

  mp = Mixpeek(api_key="API_KEY")

  bucket = mp.buckets.create(
      name="security_footage",
      bucket_schema={
          "properties": {
              "image": {"type": "image", "required": True},
              "camera_id": {"type": "text"},
              "timestamp": {"type": "text"},
          }
      },
  )

  collection = mp.collections.create(
      name="detected_objects",
      feature_extractor={
          "feature_extractor_name": "yolo_detector",
          "version": "1.0.0",
      },
      source={"type": "bucket", "bucket_ids": [bucket.bucket_id]},
  )

  mp.buckets.objects.create(
      bucket_id=bucket.bucket_id,
      metadata={"camera_id": "cam_lobby_01", "timestamp": "2026-05-03T14:30:00Z"},
      blobs=[{
          "property": "image",
          "type": "image",
          "data": {"url": "s3://my-bucket/footage/frame_001.jpg"},
      }],
  )
  ```
</CodeGroup>

Trigger batch processing to run YOLO across all uploaded images:

```bash theme={null}
# Buckets/batches are top-level, keyed by the X-Namespace header.
# A batch is created (objects + collections) then submitted.
H=(-H "Authorization: Bearer $MP_API_KEY" -H "X-Namespace: $NS_ID" -H "Content-Type: application/json")

OBJIDS=$(curl -s "$MP_API_URL/v1/buckets/$BUCKET_ID/objects" "${H[@]}" | jq -c '[.results[].object_id]')
BATCH_ID=$(curl -s -X POST "$MP_API_URL/v1/buckets/$BUCKET_ID/batches" "${H[@]}" \
  -d "{\"batch_name\":\"yolo-run\",\"object_ids\":$OBJIDS,\"collection_ids\":[\"$COLLECTION_ID\"]}" | jq -r '.batch_id')
curl -s -X POST "$MP_API_URL/v1/buckets/$BUCKET_ID/batches/$BATCH_ID/submit" "${H[@]}" \
  -d "{\"collection_ids\":[\"$COLLECTION_ID\"]}"
```

***

## 3. Build a Retriever for Detection Review

Create a retriever that surfaces detections for human review. Filter by confidence to focus reviewers on borderline cases where the model is least sure.

<CodeGroup>
  ```bash cURL theme={null}
  curl -s -X POST "$MP_API_URL/v1/namespaces/$NS_ID/retrievers" \
    -H "Authorization: Bearer $MP_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "name": "detection_review",
      "stages": [{
        "stage_name": "low_confidence",
        "stage_type": "filter",
        "config": {
          "stage_id": "feature_search",
          "parameters": {
            "searches": [{
              "feature_uri": "mixpeek://yolo_detector@1.0.0/detection_embedding",
              "query": "{{INPUT.query}}"
            }],
            "filters": {
              "AND": [{
                "field": "detections.0.confidence",
                "operator": "lt",
                "value": 0.7
              }]
            }
          }
        }
      }]
    }'
  ```

  ```python Python SDK theme={null}
  retriever = mp.retrievers.create(
      name="detection_review",
      stages=[{
          "stage_name": "low_confidence",
          "stage_type": "filter",
          "config": {
              "stage_id": "feature_search",
              "parameters": {
                  "searches": [{
                      "feature_uri": "mixpeek://yolo_detector@1.0.0/detection_embedding",
                      "query": "{{INPUT.query}}",
                  }],
                  "filters": {
                      "AND": [{
                          "field": "detections.0.confidence",
                          "operator": "lt",
                          "value": 0.7,
                      }]
                  },
              },
          },
      }],
  )
  ```
</CodeGroup>

<Tip>
  **Focus reviewers on uncertainty.** Annotating high-confidence correct detections adds little value. Filtering for confidence \< 0.7 routes reviewers to the cases where YOLO is least sure — exactly the training signal you need for the next fine-tune.
</Tip>

***

## 4. Annotate Detections

Reviewers examine each detection and record their decision. The `payload` carries the corrected bounding boxes and class labels — this is what becomes training data.

### Label Vocabulary

Before annotating, establish a consistent label vocabulary. The stats endpoint groups by exact string match, so consistency matters.

<CardGroup cols={2}>
  <Card title="confirmed" icon="check">
    Detection is correct as-is. Becomes a **positive training sample** that reinforces the model.
  </Card>

  <Card title="corrected" icon="pen">
    Bounding box or class was adjusted. **Highest-value sample** — teaches the model its mistakes.
  </Card>

  <Card title="false_positive" icon="xmark">
    No real object at this location. Becomes a **hard negative** that reduces false alarms.
  </Card>

  <Card title="missed" icon="plus">
    Object exists but wasn't detected. Added as **new ground truth** for the next training run.
  </Card>
</CardGroup>

### Recording Decisions

<Tabs>
  <Tab title="Individual Annotations">
    <CodeGroup>
      ```bash cURL theme={null}
      # Correct detection — class was wrong
      curl -X POST "$MP_API_URL/v1/annotations" \
        -H "Authorization: Bearer $MP_API_KEY" \
        -H "X-Namespace: $MP_NAMESPACE" \
        -H "Content-Type: application/json" \
        -d '{
          "document_id": "doc_frame_001_det_3",
          "collection_id": "col_detected_objects",
          "label": "corrected",
          "confidence": 1.0,
          "reasoning": "Model predicted car, actual object is delivery van.",
          "payload": {
            "predicted_class": "car",
            "true_class": "delivery_van",
            "bbox": {"x": 340, "y": 220, "w": 180, "h": 120},
            "image_width": 1920,
            "image_height": 1080
          },
          "retriever_id": "ret_detection_review",
          "execution_id": "exec_review_batch_01"
        }'

      # Confirm correct detection
      curl -X POST "$MP_API_URL/v1/annotations" \
        -H "Authorization: Bearer $MP_API_KEY" \
        -H "X-Namespace: $MP_NAMESPACE" \
        -H "Content-Type: application/json" \
        -d '{
          "document_id": "doc_frame_001_det_1",
          "collection_id": "col_detected_objects",
          "label": "confirmed",
          "confidence": 1.0,
          "payload": {
            "predicted_class": "person",
            "true_class": "person",
            "bbox": {"x": 640, "y": 300, "w": 90, "h": 200}
          },
          "retriever_id": "ret_detection_review",
          "execution_id": "exec_review_batch_01"
        }'

      # Reject false positive
      curl -X POST "$MP_API_URL/v1/annotations" \
        -H "Authorization: Bearer $MP_API_KEY" \
        -H "X-Namespace: $MP_NAMESPACE" \
        -H "Content-Type: application/json" \
        -d '{
          "document_id": "doc_frame_001_det_5",
          "collection_id": "col_detected_objects",
          "label": "false_positive",
          "reasoning": "Shadow on wall, not an actual object.",
          "retriever_id": "ret_detection_review",
          "execution_id": "exec_review_batch_01"
        }'
      ```

      ```python Python SDK theme={null}
      # Correct detection — class was wrong
      mp.annotations.create(
          document_id="doc_frame_001_det_3",
          collection_id="col_detected_objects",
          label="corrected",
          confidence=1.0,
          reasoning="Model predicted car, actual object is delivery van.",
          payload={
              "predicted_class": "car",
              "true_class": "delivery_van",
              "bbox": {"x": 340, "y": 220, "w": 180, "h": 120},
              "image_width": 1920,
              "image_height": 1080,
          },
          retriever_id="ret_detection_review",
          execution_id="exec_review_batch_01",
      )

      # Confirm correct detection
      mp.annotations.create(
          document_id="doc_frame_001_det_1",
          collection_id="col_detected_objects",
          label="confirmed",
          confidence=1.0,
          payload={
              "predicted_class": "person",
              "true_class": "person",
              "bbox": {"x": 640, "y": 300, "w": 90, "h": 200},
          },
          retriever_id="ret_detection_review",
          execution_id="exec_review_batch_01",
      )

      # Reject false positive
      mp.annotations.create(
          document_id="doc_frame_001_det_5",
          collection_id="col_detected_objects",
          label="false_positive",
          reasoning="Shadow on wall, not an actual object.",
          retriever_id="ret_detection_review",
          execution_id="exec_review_batch_01",
      )
      ```
    </CodeGroup>
  </Tab>

  <Tab title="Bulk Review (up to 1,000)">
    Process an entire review queue in a single call:

    ```bash theme={null}
    curl -X POST "$MP_API_URL/v1/annotations/bulk" \
      -H "Authorization: Bearer $MP_API_KEY" \
      -H "X-Namespace: $MP_NAMESPACE" \
      -H "Content-Type: application/json" \
      -d '{
        "create": [
          {
            "document_id": "doc_det_001",
            "collection_id": "col_detected_objects",
            "label": "confirmed",
            "payload": {"true_class": "person", "bbox": {"x": 100, "y": 200, "w": 50, "h": 120}}
          },
          {
            "document_id": "doc_det_002",
            "collection_id": "col_detected_objects",
            "label": "corrected",
            "payload": {"predicted_class": "dog", "true_class": "cat", "bbox": {"x": 300, "y": 150, "w": 80, "h": 60}}
          },
          {
            "document_id": "doc_det_003",
            "collection_id": "col_detected_objects",
            "label": "false_positive"
          }
        ]
      }'
    ```

    Each operation is independent — a failure in one does not roll back the others. The response includes per-operation results so you can retry individual failures.
  </Tab>
</Tabs>

<Note>
  Always include `retriever_id` and `execution_id` when annotating retriever results. This provenance link lets you measure which retrievers produce the most approved vs. rejected results — critical for evaluating retriever quality over time.
</Note>

***

## 5. Track Model Quality with Stats

Monitor how your model is performing across review cycles. A rising `corrected` or `false_positive` rate signals the model needs retraining.

<CodeGroup>
  ```bash cURL theme={null}
  curl "$MP_API_URL/v1/annotations/stats?collection_id=col_detected_objects" \
    -H "Authorization: Bearer $MP_API_KEY" \
    -H "X-Namespace: $MP_NAMESPACE"
  ```

  ```python Python SDK theme={null}
  stats = mp.annotations.stats(collection_id="col_detected_objects")
  # {"total": 500, "by_label": {"confirmed": 340, "corrected": 95, "false_positive": 45, "missed": 20}}
  ```
</CodeGroup>

### Interpreting Stats for Retraining Decisions

| Metric              | Healthy | Action Needed                                                               |
| ------------------- | ------- | --------------------------------------------------------------------------- |
| Confirmed rate      | > 80%   | Model is performing well                                                    |
| Corrected rate      | > 15%   | Class confusion — retrain with corrected examples                           |
| False positive rate | > 10%   | Confidence threshold too low, or hard negatives needed                      |
| Missed rate         | > 5%    | Model is missing objects — add missed annotations as positive training data |

<Warning>
  Track stats **over time**, not just cumulatively. A model at 90% confirmed overall might be at 60% confirmed on last week's data if the deployment context changed (new camera angle, different lighting, seasonal changes).
</Warning>

***

## 6. Export Annotations as YOLO Training Data

Query your annotations and convert them to YOLO format. Every corrected bounding box and confirmed detection becomes a labeled training sample.

```python theme={null}
import os

confirmed = mp.annotations.list(
    collection_id="col_detected_objects",
    label="confirmed",
)
corrected = mp.annotations.list(
    collection_id="col_detected_objects",
    label="corrected",
)

os.makedirs("dataset/labels", exist_ok=True)

class_map = {}
class_counter = 0

for ann in confirmed.items + corrected.items:
    payload = ann.payload
    true_class = payload.get("true_class", payload.get("predicted_class"))
    bbox = payload.get("bbox", {})
    img_w = payload.get("image_width", 1920)
    img_h = payload.get("image_height", 1080)

    if not bbox or not true_class:
        continue

    if true_class not in class_map:
        class_map[true_class] = class_counter
        class_counter += 1

    # Convert to YOLO format: class x_center y_center width height (normalized)
    x_center = bbox["x"] / img_w
    y_center = bbox["y"] / img_h
    width = bbox["w"] / img_w
    height = bbox["h"] / img_h

    label_file = f"dataset/labels/{ann.document_id}.txt"
    with open(label_file, "a") as f:
        f.write(f"{class_map[true_class]} {x_center:.6f} {y_center:.6f} {width:.6f} {height:.6f}\n")

with open("dataset/classes.txt", "w") as f:
    for name, idx in sorted(class_map.items(), key=lambda x: x[1]):
        f.write(f"{name}\n")

print(f"Exported {len(confirmed.items) + len(corrected.items)} annotations across {len(class_map)} classes")
```

<Info>
  The YOLO format expects one `.txt` file per image with lines of `class x_center y_center width height`, all values normalized to `[0, 1]`. The export script handles this conversion from Mixpeek's pixel-coordinate annotation payloads.
</Info>

***

## 7. Fine-Tune and Redeploy

Fine-tune YOLO externally with your exported annotations, then upload the improved weights as a new extractor version.

<Tabs>
  <Tab title="Fine-Tune (Ultralytics)">
    ```python theme={null}
    from ultralytics import YOLO

    model = YOLO("yolov8m.pt")
    model.train(
        data="dataset/data.yaml",
        epochs=50,
        imgsz=640,
        batch=16,
        name="yolo_v2_finetuned",
    )

    model.export(format="torchscript")
    ```
  </Tab>

  <Tab title="Upload as New Extractor Version">
    ```bash theme={null}
    # Update manifest.py version to "2.0.0" and package
    zip -r yolo_detector_v2.zip yolo_detector/

    UPLOAD=$(curl -s -X POST "$MP_API_URL/v1/namespaces/$NS_ID/plugins/uploads" \
      -H "Authorization: Bearer $MP_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"name": "yolo_detector", "version": "2.0.0", "file_size_bytes": 80000}')

    UPLOAD_ID=$(echo $UPLOAD | jq -r '.upload_id')
    PRESIGNED_URL=$(echo $UPLOAD | jq -r '.presigned_url')

    curl -s -X PUT "$PRESIGNED_URL" \
      -H "Content-Type: application/zip" \
      --data-binary @yolo_detector_v2.zip

    curl -s -X POST "$MP_API_URL/v1/namespaces/$NS_ID/plugins/uploads/$UPLOAD_ID/confirm" \
      -H "Authorization: Bearer $MP_API_KEY"

    curl -s -X POST "$MP_API_URL/v1/namespaces/$NS_ID/plugins/yolo_detector_2_0_0/deploy?deployment_type=batch_only" \
      -H "Authorization: Bearer $MP_API_KEY"
    ```
  </Tab>

  <Tab title="Alternative: Model Registry">
    Upload fine-tuned weights to the [Model Registry](/processing/model-registry) as a namespace model. This separates model weights from extractor code, so you can iterate on weights without repackaging the extractor.

    ```bash theme={null}
    # Package weights
    tar -czvf yolo_v2_weights.tar.gz ./runs/detect/yolo_v2_finetuned/weights/

    # Upload to registry
    curl -X POST "$MP_API_URL/v1/namespaces/$NS_ID/models" \
      -H "Authorization: Bearer $MP_API_KEY" \
      -F "file=@yolo_v2_weights.tar.gz" \
      -F "name=yolo-detector" \
      -F "version=2.0.0" \
      -F "model_format=pytorch" \
      -F "task_type=detection" \
      -F "num_gpus=1"

    # Deploy to Ray object store
    curl -X POST "$MP_API_URL/v1/namespaces/$NS_ID/models/yolo-detector_2_0_0/deploy" \
      -H "Authorization: Bearer $MP_API_KEY"
    ```

    Then reference in your extractor via `load_namespace_model("yolo-detector_2_0_0")`.
  </Tab>
</Tabs>

<Tip>
  The new version gets its own feature URI — `mixpeek://yolo_detector@2.0.0/detection_embedding` — so you can run both versions side by side and compare results before switching production traffic.
</Tip>

***

## 8. Auto-Classify Detections with Taxonomies

Once you have enough confirmed annotations, promote them to a reference collection. Then create a taxonomy that auto-classifies future detections by matching against your curated ground truth.

<Steps>
  <Step title="Create the taxonomy">
    ```bash theme={null}
    curl -s -X POST "$MP_API_URL/v1/namespaces/$NS_ID/taxonomies" \
      -H "Authorization: Bearer $MP_API_KEY" \
      -H "Content-Type: application/json" \
      -d "{
        \"taxonomy_name\": \"object-catalog\",
        \"taxonomy_type\": \"flat\",
        \"retriever_id\": \"ret_object_catalog_search\",
        \"collection_id\": \"col_verified_objects\",
        \"input_mappings\": [{
          \"source\": \"detection_embedding\",
          \"target\": \"query\"
        }],
        \"enrichment_fields\": [
          {\"source\": \"true_class\", \"target\": \"verified_class\"},
          {\"source\": \"category\", \"target\": \"object_category\"}
        ],
        \"threshold\": 0.75,
        \"execution_mode\": \"materialize\"
      }"
    ```
  </Step>

  <Step title="Apply to your detection collection">
    Every new image auto-classifies at ingestion time:

    ```json theme={null}
    {
      "taxonomy_applications": [
        {
          "taxonomy_id": "tax_object_catalog",
          "execution_mode": "materialize"
        }
      ]
    }
    ```
  </Step>

  <Step title="Backfill existing data when the reference improves">
    When annotations accumulate and your reference collection gets better, trigger retroactive mode to reclassify all existing detections:

    ```bash theme={null}
    curl -s -X POST "$MP_API_URL/v1/namespaces/$NS_ID/taxonomies/tax_object_catalog/apply" \
      -H "Authorization: Bearer $MP_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"execution_mode": "retroactive", "collection_id": "col_detected_objects"}'
    ```
  </Step>
</Steps>

<Note>
  Retroactive reapplication is a first-class operation, not a data migration. When your reference improves — more annotations, better coverage, new categories — old data automatically re-benefits.
</Note>

***

## 9. Discover New Categories with Clusters

YOLO might detect "unknown" objects that don't fit existing classes. Use clustering to group similar unknowns and discover categories you haven't labeled yet.

```bash theme={null}
curl -s -X POST "$MP_API_URL/v1/namespaces/$NS_ID/clusters" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"cluster_name\": \"unknown-objects\",
    \"collection_ids\": [\"$COLLECTION_ID\"],
    \"cluster_type\": \"vector\",
    \"vector_config\": {
      \"feature_uris\": [\"mixpeek://yolo_detector@2.0.0/detection_embedding\"],
      \"clustering_method\": \"hdbscan\",
      \"hdbscan_parameters\": {\"min_cluster_size\": 5}
    },
    \"llm_labeling\": {
      \"enabled\": true,
      \"input_mappings\": [{
        \"source\": \"payload\",
        \"fields\": [\"detections\"]
      }]
    },
    \"dimension_reduction\": {\"method\": \"umap\", \"n_components\": 2}
  }"
```

Clusters reveal groups like "delivery trucks," "bicycles," or "strollers" — objects the base YOLO model might lump together or miss entirely.

<Steps>
  <Step title="Review cluster labels">
    The LLM-generated name gives you a starting point. Review the cluster members to confirm the grouping makes sense.
  </Step>

  <Step title="Promote to taxonomy node">
    The cluster becomes a reference for auto-classification. Future detections matching this cluster auto-label.
  </Step>

  <Step title="Add to training data">
    Confirmed cluster members become training samples for the next YOLO fine-tune — new classes discovered from your own data.
  </Step>
</Steps>

***

## 10. Automate the Loop with Webhooks

Wire up webhooks so the pipeline runs without manual intervention. Each annotation event can trigger downstream processing.

```bash theme={null}
curl -X POST "$MP_API_URL/v1/webhooks" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "webhook_name": "detection-review-events",
    "url": "https://your-app.com/webhooks/detections",
    "events": [
      "annotation.created",
      "annotation.updated",
      "batch.completed"
    ]
  }'
```

### Automation Patterns

| Event                | Trigger                                       | Action                                                  |
| -------------------- | --------------------------------------------- | ------------------------------------------------------- |
| `annotation.created` | Label is `confirmed` or `corrected`           | Add to reference collection, append to training dataset |
| `annotation.created` | Label is `false_positive`                     | Log as hard negative for next training run              |
| Annotation count     | Crosses threshold (e.g., 500 new corrections) | Trigger fine-tuning job, export training data           |
| `batch.completed`    | New extractor version finishes processing     | Run evaluation comparing v1 vs. v2 detection quality    |

***

## The Compounding Flywheel

Each Mixpeek primitive contributes to a system that gets better with use:

<CardGroup cols={2}>
  <Card title="Custom Extractor" icon="plug">
    Runs YOLO, produces detections with stable feature URIs. Versioned — v1 and v2 coexist.
  </Card>

  <Card title="Annotations" icon="pen-to-square">
    Captures human corrections — the highest-quality training signal. Bulk API for review queues.
  </Card>

  <Card title="Model Registry" icon="database">
    Stores fine-tuned weights. Upload, deploy, version — without repackaging the extractor.
  </Card>

  <Card title="Taxonomies" icon="sitemap">
    Auto-classifies detections against curated ground truth. Retroactive mode backfills old data.
  </Card>

  <Card title="Clusters" icon="circle-nodes">
    Discovers object categories you haven't labeled yet. Promote stable clusters to taxonomy nodes.
  </Card>

  <Card title="Webhooks" icon="bell">
    Triggers downstream actions on every annotation event. No polling required.
  </Card>
</CardGroup>

<Tip>
  The key insight is that these primitives **compose**. Annotations curate the edges where the model was wrong. Those curated edges become training data *and* reference collection entries. The reference collection powers taxonomy auto-classification. Clusters discover what you haven't labeled yet. And every improvement backfills via retroactive taxonomy application — old data re-benefits from every new correction.
</Tip>

## Next Steps

<CardGroup cols={2}>
  <Card title="Custom Extractors" icon="plug" href="/processing/custom-extractors">
    Full guide to packaging and deploying custom feature extractors.
  </Card>

  <Card title="Model Registry" icon="database" href="/processing/model-registry">
    Upload fine-tuned weights, manage versions, and deploy to the inference cluster.
  </Card>

  <Card title="Taxonomies" icon="sitemap" href="/enrichment/taxonomies">
    Build flat and hierarchical classification systems with retroactive reapplication.
  </Card>

  <Card title="Clusters" icon="circle-nodes" href="/enrichment/clusters">
    Discover structure in your data with 8 algorithms and LLM-powered labeling.
  </Card>
</CardGroup>
