Clustering
Unsupervised Clustering & Theme Discovery
Clusters content into semantic groups using HDBSCAN, surfacing themes, variants, and outliers. Turns raw corpora into navigable structure without labels.
video
image
text
audio
Multi-Stage
54.0K runs
Deploy RecipeWhy This Matters
Clustering is operational insight infrastructure. Once computed, clusters become queryable resources for navigation, QA, and theme-based retrieval.
from mixpeek import Mixpeekclient = Mixpeek(api_key="your-api-key")# Create collection with embeddingscollection = client.collections.create(collection_name="unlabeled_corpus",feature_extractor={"feature_extractor_name": "multimodal_extractor","version": "v1"})# Run clusteringclusters = client.analytics.cluster(collection_id=collection.id,algorithm="hdbscan",min_cluster_size=15,return_outliers=True)# Generate cluster summaries with LLMfor cluster in clusters:summary = client.llm.summarize(cluster_id=cluster.id,sample_size=10)# Search within a clusterresults = client.retrievers.execute(retriever_id="cluster-search",inputs={"cluster_id": "cluster_42","query_text": "specific theme"})
Feature Extractors
Feature Extractors
Image Embedding
Generate visual embeddings for similarity search and clustering
752K runs
Text Embedding
Extract semantic embeddings from documents, transcripts and text content
827K runs
Video Embedding
Generate vector embeddings for video content
610K runs
Audio Embedding
Extract semantic embeddings from audio content for similarity search
420K runs
Retriever Stages
Enrichment Resources
Clustering
Analytics
