Skip to main content

Clusters

When you don’t know your taxonomy yet, use clustering to discover structure from vectors. Mixpeek supports eight algorithms with optional LLM labeling to auto-name each group.
curl -X POST "https://api.mixpeek.com/v1/clusters" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "cluster_name": "content-themes",
    "collection_id": "'$COLLECTION_ID'",
    "feature_uri": "mixpeek://multimodal_extractor@v1/multimodal_embedding",
    "algorithm": { "name": "hdbscan", "params": { "min_cluster_size": 10 } },
    "llm_labeling": { "enabled": true },
    "dimension_reduction": { "method": "umap", "n_components": 2 }
  }'

Algorithms

AlgorithmBest for
hdbscanUnknown number of clusters, noisy data
kmeansKnown number of clusters, even sizes
dbscanDensity-based discovery, outlier detection
agglomerativeHierarchical structure
spectralNon-convex clusters
gaussian_mixtureOverlapping clusters
mean_shiftAutomatic cluster count
opticsVarying density

LLM Labeling

When enabled, each cluster gets auto-generated names, summaries, and keywords based on member documents. Input mappings control what the LLM sees:
  • payload — document metadata fields
  • blob — raw content (text, image URLs)
  • literal — fixed context strings

Promote to Taxonomy

Once clusters stabilize, promote them to taxonomy nodes — bridging unsupervised discovery to structured classification:
Cluster → Review → Promote to taxonomy node → Auto-classify new documents

Execution Triggers

Run clusters manually, on a cron schedule, or triggered by events. Artifacts (centroids, member lists, coordinates) are stored as Parquet in S3 for downstream analytics. Cluster API → · Trigger API →

Alerts

Get notified when new documents match specific conditions. Alerts evaluate every incoming document and fire notifications via webhook, Slack, or email.
curl -X POST "https://api.mixpeek.com/v1/alerts" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "alert_name": "brand-match",
    "collection_id": "'$COLLECTION_ID'",
    "condition": { "field": "taxonomy.brand", "operator": "exists" },
    "notification": { "type": "webhook", "url": "https://example.com/webhook" }
  }'
Alerts execute in parallel per collection — multiple alerts on the same collection don’t block each other. Alert API →

Clusters vs Taxonomies vs Alerts

I want to…Use
Discover what categories exist in my dataClusters
Apply known categories to new documentsTaxonomies
Get notified when something specific appearsAlerts
Turn discovered groups into reusable labelsClusters → promote to taxonomy