Mixpeek Logo
    Login / Signup

    Vector Clustering

    Discover document groups using HDBSCAN and other algorithms on embedding vectors

    Why do anything?

    Large document collections have hidden structure. Without clustering, you can't discover natural groupings.

    Why now?

    AI enables automatic pattern discovery. Manual categorization misses emergent themes.

    Why this feature?

    8 clustering algorithms (HDBSCAN, K-Means, DBSCAN, etc.) with LLM-powered cluster labeling and dimensionality reduction.

    How It Works

    Vector clustering discovers document groups using embedding similarity.

    1

    Dimensionality Reduction

    Optional t-SNE or UMAP for visualization

    2

    Clustering

    Apply selected algorithm (HDBSCAN, K-Means, etc.)

    3

    Labeling

    LLM generates descriptive cluster labels

    4

    Assignment

    Assign cluster IDs to documents

    Why This Approach

    HDBSCAN handles variable density clusters. LLM labeling provides human-readable cluster names.

    Integration

    cluster = client.clusters.create(algorithm="hdbscan", min_cluster_size=5)