Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Discover structure: document vectors flow through clustering algorithms to produce LLM-labeled groups, optionally promoted to taxonomy nodes, with alerts monitoring for matches
For full configuration details, parameters, and advanced options, see the Clusters reference.

Clusters

When you don’t know your taxonomy yet, use clustering to discover structure from vectors. Mixpeek supports eight algorithms with optional LLM labeling to auto-name each group.
curl -X POST "https://api.mixpeek.com/v1/clusters" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "cluster_name": "content-themes",
    "collection_id": "'$COLLECTION_ID'",
    "feature_uri": "mixpeek://multimodal_extractor@v1/multimodal_embedding",
    "algorithm": { "name": "hdbscan", "params": { "min_cluster_size": 10 } },
    "llm_labeling": { "enabled": true },
    "dimension_reduction": { "method": "umap", "n_components": 2 }
  }'

Algorithms

AlgorithmBest for
hdbscanUnknown number of clusters, noisy data
kmeansKnown number of clusters, even sizes
dbscanDensity-based discovery, outlier detection
agglomerativeHierarchical structure
spectralNon-convex clusters
gaussian_mixtureOverlapping clusters
mean_shiftAutomatic cluster count
opticsVarying density

LLM Labeling

When enabled, each cluster gets auto-generated names, summaries, and keywords based on member documents. Input mappings control what the LLM sees:
  • payload — document metadata fields
  • blob — raw content (text, image URLs)
  • literal — fixed context strings

Promote to Taxonomy

Once clusters stabilize, promote them to taxonomy nodes — bridging unsupervised discovery to structured classification:
Cluster → Review → Promote to taxonomy node → Auto-classify new documents

Execution Triggers

Run clusters manually, on a cron schedule, or triggered by events. Artifacts (centroids, member lists, coordinates) are stored as Parquet in S3 for downstream analytics. Cluster API → · Trigger API →

Alerts

Get notified when new documents match specific conditions. Alerts evaluate every incoming document and fire notifications via webhook, Slack, or email.
curl -X POST "https://api.mixpeek.com/v1/alerts" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "alert_name": "brand-match",
    "collection_id": "'$COLLECTION_ID'",
    "condition": { "field": "taxonomy.brand", "operator": "exists" },
    "notification": { "type": "webhook", "url": "https://example.com/webhook" }
  }'
Alerts execute in parallel per collection — multiple alerts on the same collection don’t block each other. Alert API →

Clusters vs Taxonomies vs Alerts

I want to…Use
Discover what categories exist in my dataClusters
Apply known categories to new documentsTaxonomies
Get notified when something specific appearsAlerts
Turn discovered groups into reusable labelsClusters → promote to taxonomy