Mixpeek Logo
    Themes

    Clustering & Theme Discovery

    Unsupervised clustering that groups content into semantic themes using HDBSCAN. Surfaces hidden patterns, content variants, and outliers without requiring predefined labels.

    video
    image
    text
    audio
    Multi-Stage
    54.0K runs
    Deploy Recipe

    "Discover hidden themes in unlabeled user-generated content and identify outliers"

    Why This Matters

    You can't search for what you don't know exists. Clustering reveals the natural structure in your content—themes, duplicates, and anomalies—before you even ask.

    import requests
    API_URL = "https://api.mixpeek.com"
    headers = {"Authorization": "Bearer YOUR_API_KEY", "X-Namespace": "your-namespace"}
    # Create cluster configuration
    cluster = requests.post(f"{API_URL}/v1/clusters", headers=headers, json={
    "cluster_name": "content_themes",
    "source_collection_ids": ["col_my_collection"],
    "feature_addresses": ["mixpeek://multimodal_extractor@v1/embedding"],
    "algorithm": "hdbscan",
    "algorithm_config": {"min_cluster_size": 15},
    "llm_labeling": {"provider": "openai_chat_v1", "model": "gpt-4o-mini"}
    }).json()
    # Execute clustering
    execution = requests.post(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/execute",
    headers=headers
    ).json()
    # Get cluster artifacts with centroids
    artifacts = requests.get(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/executions/{execution['run_id']}/artifacts",
    headers=headers,
    params={"include_centroids": True}
    ).json()
    # Explore discovered themes
    for group in artifacts["clusters"]:
    print(f"Theme: {group['label']}")
    print(f"Size: {group['member_count']} items")
    print(f"Keywords: {', '.join(group.get('keywords', []))}")

    Feature Extractors

    Image Embedding

    Generate visual embeddings for similarity search and clustering

    752K runs

    Text Embedding

    Extract semantic embeddings from documents, transcripts and text content

    827K runs

    Video Embedding

    Generate vector embeddings for video content

    610K runs

    Audio Embedding

    Extract semantic embeddings from audio content for similarity search

    420K runs

    Retriever Stages

    Documentation

    Use Cases Using This Recipe

    Intermediate
    Coming Soon
    8 min

    Multimodal Lead Intelligence

    Enrich leads with visual and behavioral signals from their content

    +30% improvement

    Lead scoring accuracy

    Who It's For

    B2B sales teams, demand gen marketers, and ABM platforms enriching 10K+ leads monthly

    Intermediate
    Coming Soon
    7 min

    Talent Intelligence & Casting

    Match talent to roles using multimodal portfolio analysis

    75% reduction

    Casting search time

    Who It's For

    Casting directors, talent agencies, and production companies managing 10K+ talent profiles

    Intermediate
    Coming Soon
    7 min

    Social Media Content Intelligence

    Analyze and optimize social content performance with multimodal AI

    +35% average improvement

    Content engagement rate

    Who It's For

    Social media managers, content strategists, and brand teams publishing 100+ posts monthly across platforms

    Beginner

    Fashion Visual Product Discovery

    Search for fashion by style, not just by name or brand

    3x more products viewed per session

    Product discovery engagement

    Who It's For

    Fashion e-commerce platforms, apparel retailers, and personal styling services managing catalogs of 100K+ products where visual style drives purchase decisions

    Intermediate

    AI-Powered Stock Media Search

    Find the perfect stock asset by describing what you envision, not what keywords to try

    +45% more purchases per search session

    Search-to-license conversion rate

    Who It's For

    Stock media platforms, content licensing marketplaces, and enterprise media libraries serving creative professionals who need to find specific visual and audio assets quickly