Clustering & Theme Discovery
Unsupervised clustering that groups content into semantic themes using HDBSCAN. Surfaces hidden patterns, content variants, and outliers without requiring predefined labels.
"Discover hidden themes in unlabeled user-generated content and identify outliers"
Why This Matters
You can't search for what you don't know exists. Clustering reveals the natural structure in your content—themes, duplicates, and anomalies—before you even ask.
import requestsAPI_URL = "https://api.mixpeek.com"headers = {"Authorization": "Bearer YOUR_API_KEY", "X-Namespace": "your-namespace"}# Create cluster configurationcluster = requests.post(f"{API_URL}/v1/clusters", headers=headers, json={"cluster_name": "content_themes","source_collection_ids": ["col_my_collection"],"feature_addresses": ["mixpeek://multimodal_extractor@v1/embedding"],"algorithm": "hdbscan","algorithm_config": {"min_cluster_size": 15},"llm_labeling": {"provider": "openai_chat_v1", "model": "gpt-4o-mini"}}).json()# Execute clusteringexecution = requests.post(f"{API_URL}/v1/clusters/{cluster['cluster_id']}/execute",headers=headers).json()# Get cluster artifacts with centroidsartifacts = requests.get(f"{API_URL}/v1/clusters/{cluster['cluster_id']}/executions/{execution['run_id']}/artifacts",headers=headers,params={"include_centroids": True}).json()# Explore discovered themesfor group in artifacts["clusters"]:print(f"Theme: {group['label']}")print(f"Size: {group['member_count']} items")print(f"Keywords: {', '.join(group.get('keywords', []))}")
Feature Extractors
Image Embedding
Generate visual embeddings for similarity search and clustering
Text Embedding
Extract semantic embeddings from documents, transcripts and text content
Video Embedding
Generate vector embeddings for video content
Audio Embedding
Extract semantic embeddings from audio content for similarity search
