Mixpeek Logo
    DriftThemes

    Semantic Drift Detection

    Monitor distribution shifts between baseline and current data using cluster comparison. Detect when new content diverges from training data or when content mix changes unexpectedly.

    video
    image
    text
    Production
    29.0K runs
    Deploy Recipe

    "Detect distribution drift in training data between Q1 baseline and current dataset"

    Why This Matters

    Data drift is silent model degradation. By comparing cluster distributions over time, you catch drift before it impacts production systems.

    import requests
    API_URL = "https://api.mixpeek.com"
    headers = {"Authorization": "Bearer YOUR_API_KEY", "X-Namespace": "your-namespace"}
    # Create cluster config for drift monitoring
    cluster = requests.post(f"{API_URL}/v1/clusters", headers=headers, json={
    "cluster_name": "training_baseline",
    "source_collection_ids": ["col_training_data"],
    "feature_addresses": ["mixpeek://multimodal_extractor@v1/embedding"],
    "algorithm": "hdbscan",
    "algorithm_config": {"min_cluster_size": 20}
    }).json()
    # Create baseline snapshot
    baseline = requests.post(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/execute",
    headers=headers
    ).json()
    print(f"Baseline run_id: {baseline['run_id']}")
    # Later: Execute again to compare
    current = requests.post(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/execute",
    headers=headers
    ).json()
    # Compare executions by fetching both artifacts
    baseline_artifacts = requests.get(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/executions/{baseline['run_id']}/artifacts",
    headers=headers
    ).json()
    current_artifacts = requests.get(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/executions/{current['run_id']}/artifacts",
    headers=headers
    ).json()
    # Compare cluster counts and distributions
    baseline_count = len(baseline_artifacts.get("clusters", []))
    current_count = len(current_artifacts.get("clusters", []))
    print(f"Baseline: {baseline_count} clusters, Current: {current_count} clusters")
    if abs(current_count - baseline_count) > 2:
    print("ALERT: Significant drift detected!")

    Feature Extractors

    Image Embedding

    Generate visual embeddings for similarity search and clustering

    752K runs

    Text Embedding

    Extract semantic embeddings from documents, transcripts and text content

    827K runs

    Video Embedding

    Generate vector embeddings for video content

    610K runs

    Retriever Stages

    Documentation

    Use Cases Using This Recipe

    Advanced
    Coming Soon
    8 min

    Creative Lineage & Storyboard Intelligence

    Track creative evolution from concept to final cut

    85% concept retention

    Brief-to-final alignment

    Who It's For

    Creative directors, brand managers, and production teams managing multi-version creative workflows