Mixpeek Logo
    Quality

    Anomaly Detection

    Identify outliers and anomalous content using embedding distance from cluster centroids. Flag quality issues, novel content, or items that don't match expected patterns.

    video
    image
    audio
    text
    Multi-Stage
    38.0K runs
    Deploy Recipe

    "Find images that the with "

    Why This Matters

    Anomalies can be problems (data quality issues) or opportunities (novel content). Either way, you need to find them before they find you.

    import requests
    API_URL = "https://api.mixpeek.com"
    headers = {"Authorization": "Bearer YOUR_API_KEY", "X-Namespace": "your-namespace"}
    # Create baseline clusters for anomaly detection
    cluster = requests.post(f"{API_URL}/v1/clusters", headers=headers, json={
    "cluster_name": "baseline_distribution",
    "source_collection_ids": ["col_my_collection"],
    "feature_addresses": ["mixpeek://multimodal_extractor@v1/embedding"],
    "algorithm": "hdbscan",
    "algorithm_config": {"min_cluster_size": 20}
    }).json()
    # Execute to establish baseline
    execution = requests.post(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/execute",
    headers=headers
    ).json()
    # Get artifacts including outliers
    artifacts = requests.get(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/executions/{execution['run_id']}/artifacts",
    headers=headers,
    params={"include_members": True}
    ).json()
    # Find anomalies (items marked as noise by HDBSCAN)
    outliers = [m for m in artifacts.get("members", []) if m["cluster_id"] == -1]
    print(f"Found {len(outliers)} anomalous items")
    # Analyze anomaly distribution
    for item in outliers[:10]:
    print(f"Document: {item['document_id']}")
    print(f"Distance: {item.get('distance', 'N/A')}")

    Feature Extractors

    Image Embedding

    Generate visual embeddings for similarity search and clustering

    752K runs

    Video Embedding

    Generate vector embeddings for video content

    610K runs

    Text Embedding

    Extract semantic embeddings from documents, transcripts and text content

    827K runs

    Retriever Stages

    feature search

    Search collections using multimodal embeddings

    search

    score filter

    Filter documents by relevance score threshold

    filter

    Documentation