Themes

Clustering & Theme Discovery

Unsupervised clustering that groups content into semantic themes using HDBSCAN. Surfaces hidden patterns, content variants, and outliers without requiring predefined labels.

video

image

text

audio

Multi-Stage

54.0K runs

Run in Builder

"Discover hidden themes in unlabeled user-generated content and identify outliers"

Why This Matters

You can't search for what you don't know exists. Clustering reveals the natural structure in your content—themes, duplicates, and anomalies—before you even ask.

import requests

API_URL = "https://api.mixpeek.com"
headers = {"Authorization": "Bearer YOUR_API_KEY", "X-Namespace": "your-namespace"}

# Create cluster configuration
cluster = requests.post(f"{API_URL}/v1/clusters", headers=headers, json={
    "cluster_name": "content_themes",
    "source_collection_ids": ["col_my_collection"],
    "feature_addresses": ["mixpeek://multimodal_extractor@v1/embedding"],
    "algorithm": "hdbscan",
    "algorithm_config": {"min_cluster_size": 15},
    "llm_labeling": {"provider": "openai_chat_v1", "model": "gpt-4o-mini"}
}).json()

# Execute clustering
execution = requests.post(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/execute",
    headers=headers
).json()

# Get cluster artifacts with centroids
artifacts = requests.get(
    f"{API_URL}/v1/clusters/{cluster['cluster_id']}/executions/{execution['run_id']}/artifacts",
    headers=headers,
    params={"include_centroids": True}
).json()

# Explore discovered themes
for group in artifacts["clusters"]:
    print(f"Theme: {group['label']}")
    print(f"Size: {group['member_count']} items")
    print(f"Keywords: {', '.join(group.get('keywords', []))}")

Feature Extractors

Image Embedding

Generate visual embeddings for similarity search and clustering

752K runs

Text Embedding

Extract semantic embeddings from documents, transcripts and text content

827K runs

Video Embedding

Generate vector embeddings for video content

610K runs

Audio Embedding

Extract semantic embeddings from audio content for similarity search

420K runs

Retriever Stages

Resources Used

Clustering

Semantic Clusters

HDBSCAN clustering with outlier detection

Analytics

Cluster Metrics

Cluster centroids and statistics

Documentation

Clusters

Use Cases Using This Recipe

Intermediate

Coming Soon

8 min

Multimodal Lead Intelligence

Enrich leads with visual and behavioral signals from their content

+30% improvement

Lead scoring accuracy

advertising

ecommerce

Who It's For

B2B sales teams, demand gen marketers, and ABM platforms enriching 10K+ leads monthly

View Details

Intermediate

Coming Soon

7 min

Talent Intelligence & Casting

Match talent to roles using multimodal portfolio analysis

75% reduction

Casting search time

advertising

entertainment

Who It's For

Casting directors, talent agencies, and production companies managing 10K+ talent profiles

View Details

Intermediate

Coming Soon

7 min

Social Media Content Intelligence

Analyze and optimize social content performance with multimodal AI

+35% average improvement

Content engagement rate

advertising

Who It's For

Social media managers, content strategists, and brand teams publishing 100+ posts monthly across platforms

View Details

Beginner

Fashion Visual Product Discovery

Search for fashion by style, not just by name or brand

3x more products viewed per session

Product discovery engagement

ecommerce

Who It's For

Fashion e-commerce platforms, apparel retailers, and personal styling services managing catalogs of 100K+ products where visual style drives purchase decisions

View Details

Intermediate

AI-Powered Stock Media Search

Find the perfect stock asset by describing what you envision, not what keywords to try

+45% more purchases per search session

Search-to-license conversion rate

media

Who It's For

Stock media platforms, content licensing marketplaces, and enterprise media libraries serving creative professionals who need to find specific visual and audio assets quickly

View Details

Clustering & Theme Discovery

Why This Matters

Feature Extractors

Retriever Stages

Resources Used

Documentation

Use Cases Using This Recipe

Multimodal Lead Intelligence

Talent Intelligence & Casting

Social Media Content Intelligence

Fashion Visual Product Discovery

AI-Powered Stock Media Search

Related Recipes & Resources

Video Embedding

Text Embedding

Image Embedding

Audio Embedding

Audio Embedding

Video Embedding