Mixpeek Logo
    DriftThemes

    Semantic Drift Detection

    Monitor distribution shifts between baseline and current data using cluster comparison. Detect when new content diverges from training data or when content mix changes unexpectedly.

    video
    image
    text
    Production
    29.0K runs
    Deploy Recipe

    "Detect in between and "

    Why This Matters

    Data drift is silent model degradation. By comparing cluster distributions over time, you catch drift before it impacts production systems.

    import requests
    API_URL = "https://api.mixpeek.com"
    headers = {"Authorization": "Bearer YOUR_API_KEY", "X-Namespace": "your-namespace"}
    # Create cluster config for drift monitoring
    cluster = requests.post(f"{API_URL}/v1/clusters", headers=headers, json={
    "cluster_name": "training_baseline",
    "source_collection_ids": ["col_training_data"],
    "feature_addresses": ["mixpeek://multimodal_extractor@v1/embedding"],
    "algorithm": "hdbscan",
    "algorithm_config": {"min_cluster_size": 20}
    }).json()
    # Create baseline snapshot
    baseline = requests.post(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/execute",
    headers=headers
    ).json()
    print(f"Baseline run_id: {baseline['run_id']}")
    # Later: Execute again to compare
    current = requests.post(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/execute",
    headers=headers
    ).json()
    # Compare executions by fetching both artifacts
    baseline_artifacts = requests.get(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/executions/{baseline['run_id']}/artifacts",
    headers=headers
    ).json()
    current_artifacts = requests.get(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/executions/{current['run_id']}/artifacts",
    headers=headers
    ).json()
    # Compare cluster counts and distributions
    baseline_count = len(baseline_artifacts.get("clusters", []))
    current_count = len(current_artifacts.get("clusters", []))
    print(f"Baseline: {baseline_count} clusters, Current: {current_count} clusters")
    if abs(current_count - baseline_count) > 2:
    print("ALERT: Significant drift detected!")

    Feature Extractors

    Image Embedding

    Generate visual embeddings for similarity search and clustering

    752K runs

    Text Embedding

    Extract semantic embeddings from documents, transcripts and text content

    827K runs

    Video Embedding

    Generate vector embeddings for video content

    610K runs

    Retriever Stages

    Documentation