Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Clusters provide warehouse-native grouping, the multimodal equivalent of SQL GROUP BY.

Create and run clustering jobs

  • Create: Click New Cluster, select collections, pick vector or attribute clustering, and configure algorithm params. API: Create Cluster.
  • Execute: Run real-time clustering on the Engine or submit as a job for async processing. API: Execute Clustering and Submit Job.
  • Inspect: Review centroids, metrics, and members if saved. Download artifacts like parquet paths under Artifacts. API: Get Artifacts.
  • List/Get/Delete: Manage clustering configurations and results. API: List, Get, Delete.
  • Stream data: Browse cluster centroids and members directly. API: Stream Data.
  • Apply enrichment: Attach cluster labels back to a source or target collection at scale. API: Apply Enrichment.

Visualization

The cluster scatter plot maps reduced coordinates to position and size:
  • x, y → point position on the chart
  • z (when dimension_reduction.components is 3) → dot size, where larger dots represent higher z-values
This depth-cue approach surfaces the third dimension without requiring a full 3D renderer, making it easy to spot structure that would be lost in a flat 2D projection.

Tips

  • Start with a sample size to validate parameters before full runs.
  • Use LLM labeling for human-friendly labels when vectors are dense and unlabeled.
  • Set dimension_reduction.components to 3 to see depth-based sizing in the scatter plot.
1

Create a cluster job

Choose collections and configure algorithm parameters; optionally set dimensionality reduction.
2

Execute or submit

Run in real-time or submit as an asynchronous job and track via Tasks.
3

Inspect and enrich

Review centroids and metrics, then apply enrichment back to collections if desired.
Artifacts such as parquet paths allow downstream analytics and reproducible exploration.