DriftThemes

Semantic Drift Detection

Monitor distribution shifts between baseline and current data using cluster comparison. Detect when new content diverges from training data or when content mix changes unexpectedly.

video

image

text

Production

29.0K runs

Deploy Recipe

"Detect distribution drift in training data between Q1 baseline and current dataset"

Why This Matters

Data drift is silent model degradation. By comparing cluster distributions over time, you catch drift before it impacts production systems.

import requests

API_URL = "https://api.mixpeek.com"
headers = {"Authorization": "Bearer YOUR_API_KEY", "X-Namespace": "your-namespace"}

# Create cluster config for drift monitoring
cluster = requests.post(f"{API_URL}/v1/clusters", headers=headers, json={
    "cluster_name": "training_baseline",
    "source_collection_ids": ["col_training_data"],
    "feature_addresses": ["mixpeek://multimodal_extractor@v1/embedding"],
    "algorithm": "hdbscan",
    "algorithm_config": {"min_cluster_size": 20}
}).json()

# Create baseline snapshot
baseline = requests.post(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/execute",
    headers=headers
).json()
print(f"Baseline run_id: {baseline['run_id']}")

# Later: Execute again to compare
current = requests.post(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/execute",
    headers=headers
).json()

# Compare executions by fetching both artifacts
baseline_artifacts = requests.get(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/executions/{baseline['run_id']}/artifacts",
    headers=headers
).json()

current_artifacts = requests.get(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/executions/{current['run_id']}/artifacts",
    headers=headers
).json()

# Compare cluster counts and distributions
baseline_count = len(baseline_artifacts.get("clusters", []))
current_count = len(current_artifacts.get("clusters", []))
print(f"Baseline: {baseline_count} clusters, Current: {current_count} clusters")

if abs(current_count - baseline_count) > 2:
    print("ALERT: Significant drift detected!")

Feature Extractors

Image Embedding

Generate visual embeddings for similarity search and clustering

752K runs

Text Embedding

Extract semantic embeddings from documents, transcripts and text content

827K runs

Video Embedding

Generate vector embeddings for video content

610K runs

Retriever Stages

Resources Used

Clustering

Baseline Clusters

Baseline distribution snapshot for comparison

Baseline distribution snapshot

Analytics

Drift Metrics

Distribution shift and novelty scores

Documentation

Clusters

Semantic Drift Detection

Why This Matters

Feature Extractors

Retriever Stages

Resources Used

Documentation

Use Cases Using This Recipe

Creative Lineage & Storyboard Intelligence

Related Recipes & Resources

Video Embedding

Text Embedding

Image Embedding

Video Embedding

Image Embedding

Dataset Versioning