Clusters
When you don’t know your taxonomy yet, use clustering to discover structure from vectors. Mixpeek supports eight algorithms with optional LLM labeling to auto-name each group.Algorithms
| Algorithm | Best for |
|---|---|
hdbscan | Unknown number of clusters, noisy data |
kmeans | Known number of clusters, even sizes |
dbscan | Density-based discovery, outlier detection |
agglomerative | Hierarchical structure |
spectral | Non-convex clusters |
gaussian_mixture | Overlapping clusters |
mean_shift | Automatic cluster count |
optics | Varying density |
LLM Labeling
When enabled, each cluster gets auto-generated names, summaries, and keywords based on member documents. Input mappings control what the LLM sees:payload— document metadata fieldsblob— raw content (text, image URLs)literal— fixed context strings
Promote to Taxonomy
Once clusters stabilize, promote them to taxonomy nodes — bridging unsupervised discovery to structured classification:Execution Triggers
Run clusters manually, on a cron schedule, or triggered by events. Artifacts (centroids, member lists, coordinates) are stored as Parquet in S3 for downstream analytics. Cluster API → · Trigger API →Alerts
Get notified when new documents match specific conditions. Alerts evaluate every incoming document and fire notifications via webhook, Slack, or email.Clusters vs Taxonomies vs Alerts
| I want to… | Use |
|---|---|
| Discover what categories exist in my data | Clusters |
| Apply known categories to new documents | Taxonomies |
| Get notified when something specific appears | Alerts |
| Turn discovered groups into reusable labels | Clusters → promote to taxonomy |

