Discover Structure

For full configuration details, parameters, and advanced options, see the Clusters reference.

Clusters

When you don’t know your taxonomy yet, use clustering to discover structure from vectors. Mixpeek supports eight algorithms with optional LLM labeling to auto-name each group.

curl -X POST "https://api.mixpeek.com/v1/clusters" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "cluster_name": "content-themes",
    "collection_id": "'$COLLECTION_ID'",
    "feature_uri": "mixpeek://multimodal_extractor@v1/multimodal_embedding",
    "algorithm": { "name": "hdbscan", "params": { "min_cluster_size": 10 } },
    "llm_labeling": { "enabled": true },
    "dimension_reduction": { "method": "umap", "n_components": 2 }
  }'

Algorithms

Algorithm	Best for
`hdbscan`	Unknown number of clusters, noisy data
`kmeans`	Known number of clusters, even sizes
`dbscan`	Density-based discovery, outlier detection
`agglomerative`	Hierarchical structure
`spectral`	Non-convex clusters
`gaussian_mixture`	Overlapping clusters
`mean_shift`	Automatic cluster count
`optics`	Varying density

LLM Labeling

When enabled, each cluster gets auto-generated names, summaries, and keywords based on member documents. Input mappings control what the LLM sees:

payload — document metadata fields
blob — raw content (text, image URLs)
literal — fixed context strings

Promote to Taxonomy

Once clusters stabilize, promote them to taxonomy nodes — bridging unsupervised discovery to structured classification:

Cluster → Review → Promote to taxonomy node → Auto-classify new documents

Execution Triggers

Run clusters manually, on a cron schedule, or triggered by events. Artifacts (centroids, member lists, coordinates) are stored as Parquet in S3 for downstream analytics. Cluster API → · Trigger API →

Alerts

Get notified when new documents match specific conditions. Alerts evaluate every incoming document and fire notifications via webhook, Slack, or email.

curl -X POST "https://api.mixpeek.com/v1/alerts" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "alert_name": "brand-match",
    "collection_id": "'$COLLECTION_ID'",
    "condition": { "field": "taxonomy.brand", "operator": "exists" },
    "notification": { "type": "webhook", "url": "https://example.com/webhook" }
  }'

Alerts execute in parallel per collection — multiple alerts on the same collection don’t block each other. Alert API →

Clusters vs Taxonomies vs Alerts

I want to…	Use
Discover what categories exist in my data	Clusters
Apply known categories to new documents	Taxonomies
Get notified when something specific appears	Alerts
Turn discovered groups into reusable labels	Clusters → promote to taxonomy

Get Started

Vector Store

What Mixpeek Extracts

Retrieval

Platform

Resources

Discover Structure

Clusters

Algorithms

LLM Labeling

Promote to Taxonomy

Execution Triggers

Alerts

Clusters vs Taxonomies vs Alerts

Get Started

Vector Store

What Mixpeek Extracts

Retrieval

Platform

Resources

Documentation Index

​Clusters

​Algorithms

​LLM Labeling

​Promote to Taxonomy

​Execution Triggers

​Alerts

​Clusters vs Taxonomies vs Alerts

Clusters

Algorithms

LLM Labeling

Promote to Taxonomy

Execution Triggers

Alerts

Clusters vs Taxonomies vs Alerts