Vector Clustering
Discover document groups using HDBSCAN and other algorithms on embedding vectors
Why do anything?
Large document collections have hidden structure. Without clustering, you can't discover natural groupings.
Why now?
AI enables automatic pattern discovery. Manual categorization misses emergent themes.
Why this feature?
8 clustering algorithms (HDBSCAN, K-Means, DBSCAN, etc.) with LLM-powered cluster labeling and dimensionality reduction.
How It Works
Vector clustering discovers document groups using embedding similarity.
1
Dimensionality Reduction
Optional t-SNE or UMAP for visualization
2
Clustering
Apply selected algorithm (HDBSCAN, K-Means, etc.)
3
Labeling
LLM generates descriptive cluster labels
4
Assignment
Assign cluster IDs to documents
Why This Approach
HDBSCAN handles variable density clusters. LLM labeling provides human-readable cluster names.
Where This Is Used
Integration
cluster = client.clusters.create(algorithm="hdbscan", min_cluster_size=5)
