Mixpeek Logo
    Drift

    Dataset QA, Audit & Drift Detection

    Detects bias, gaps, duplication, and distribution shifts using baseline clustering snapshots. This is where infrastructure buyers lean in hard.

    video
    image
    text
    Production
    29.0K runs
    Deploy Recipe

    Why This Matters

    Drift detection is operational monitoring—not a one-time audit. Baseline clusters become the source of truth for data quality over time.

    from mixpeek import Mixpeek
    client = Mixpeek(api_key="your-api-key")
    # Create baseline cluster snapshot
    baseline = client.analytics.cluster(
    collection_id="training_data",
    algorithm="hdbscan",
    snapshot_id="baseline_2024_q1"
    )
    # Detect drift in new data
    drift_report = client.analytics.drift_detection(
    collection_id="training_data",
    baseline_snapshot="baseline_2024_q1",
    current_period={
    "start": "2024-10-01",
    "end": "2024-12-31"
    },
    alert_threshold=0.15
    )
    # Find outliers (potential novelty or data quality issues)
    outliers = client.retrievers.execute(
    retriever_id="outlier-search",
    inputs={
    "drift_score_min": 0.8,
    "cluster_id": None # Unassigned to any cluster
    }
    )

    Feature Extractors

    Image Embedding

    Generate visual embeddings for similarity search and clustering

    752K runs

    Text Embedding

    Extract semantic embeddings from documents, transcripts and text content

    827K runs

    Video Embedding

    Generate vector embeddings for video content

    610K runs

    Retriever Stages

    Enrichment Resources

    Clustering
    Analytics

    Documentation