Partially Update Cluster

curl --request PATCH \ --url https://api.mixpeek.com/v1/clusters/{cluster_identifier} \ --header 'Content-Type: application/json' \ --data ' { "cluster_name": "<string>", "description": "<string>", "metadata": {}, "llm_labeling": { "description": "Text-only labeling with multiple fields", "enabled": true, "include_keywords": true, "include_summary": true, "labeling_inputs": { "input_mappings": [ { "input_key": "title", "path": "title", "source_type": "payload" }, { "input_key": "description", "path": "description", "source_type": "payload" }, { "input_key": "text", "path": "text", "source_type": "payload" } ] }, "model_name": "gpt-4o-mini-2024-07-18", "provider": "openai" }, "filters": {}, "face_cluster_merge": { "enabled": true, "centroid_cosine_threshold": 0.55, "bbox_iou_threshold": 0.4, "scene_jaccard_threshold": 0.3, "bbox_field": "bbox", "frame_field": "frame_number", "scene_field": "scene_id" }, "sample_size": 123, "algorithm_params": {} } '

{ "collection_ids": [ "<string>" ], "cluster_name": "<string>", "cluster_type": "vector", "vector_config": { "algorithm_params": { "min_cluster_size": 10, "min_samples": 5 }, "clustering_method": "hdbscan", "description": "HDBSCAN clustering with multimodal embeddings", "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding", "sample_size": 1000 }, "attribute_config": { "attributes": [ "category" ], "description": "Simple category clustering", "hierarchical_grouping": false }, "filters": { "AND": [ { "field": "name", "operator": "eq", "value": "John" }, { "field": "age", "operator": "gte", "value": 30 } ], "OR": [ { "field": "status", "operator": "eq", "value": "active" }, { "field": "role", "operator": "eq", "value": "admin" } ], "NOT": [ { "field": "department", "operator": "eq", "value": "HR" }, { "field": "location", "operator": "eq", "value": "remote" } ], "case_sensitive": true }, "llm_labeling": { "description": "Text-only labeling with multiple fields", "enabled": true, "include_keywords": true, "include_summary": true, "labeling_inputs": { "input_mappings": [ { "input_key": "title", "path": "title", "source_type": "payload" }, { "input_key": "description", "path": "description", "source_type": "payload" }, { "input_key": "text", "path": "text", "source_type": "payload" } ] }, "model_name": "gpt-4o-mini-2024-07-18", "provider": "openai" }, "enrich_source_collection": false, "source_enrichment_config": { "field_mappings": [ { "source_field": "cluster_id", "target_field": "category_id" }, { "source_field": "cluster_label", "target_field": "category_name" }, { "source_field": "distance_to_centroid", "target_field": "category_confidence" } ] }, "auto_execute_on_batch": false, "auto_execute_min_documents": 123, "auto_execute_cooldown_seconds": 3600, "cluster_id": "<string>", "parquet_path": "<string>", "members_key": "<string>", "num_clusters": 123, "cluster_stats": { "num_clusters": 123, "noise_points": 123, "silhouette_score": 123, "extra": {} }, "status": "PENDING", "task_id": "<string>", "last_run_id": "<string>", "created_at": "2023-11-07T05:31:56Z", "updated_at": "2023-11-07T05:31:56Z", "metadata": {} }

Headers

Authorization

string

Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

Example:

"Bearer YOUR_MIXPEEK_API_KEY"

authorization

string

X-Namespace

string

Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'. Falls back to ?namespace= query parameter if the header is omitted.

Examples:

"ns_abc123def456"

"production"

"my-namespace"

Path Parameters

cluster_identifier

string

required

Cluster ID or name

Body

application/json

Request model for partially updating a cluster (PATCH operation).

cluster_name

string | null

Updated name for the cluster

description

string | null

Updated description for the cluster

metadata

Metadata · object

Updated metadata for the cluster

llm_labeling

LLMLabeling · object

Updated LLM labeling configuration. Takes effect on the next POST /v1/clusters/{id}/execute — use this to correct a null labeling_inputs mapping that produced schema-metadata labels, without re-embedding or re-running HDBSCAN.

Show child attributes

Example:

{
  "description": "Text-only labeling with multiple fields",
  "enabled": true,
  "include_keywords": true,
  "include_summary": true,
  "labeling_inputs": {
    "input_mappings": [
      {
        "input_key": "title",
        "path": "title",
        "source_type": "payload"
      },
      {
        "input_key": "description",
        "path": "description",
        "source_type": "payload"
      },
      {
        "input_key": "text",
        "path": "text",
        "source_type": "payload"
      }
    ]
  },
  "model_name": "gpt-4o-mini-2024-07-18",
  "provider": "openai"
}

filters

Filters · object

Updated pre-filter for clustering input documents. Overrides the cluster's stored filter on subsequent execute calls.

face_cluster_merge

FaceClusterMergeConfig · object

Updated post-HDBSCAN face-identity merge configuration. Takes effect on the next POST /v1/clusters/{id}/execute. Pass an object with enabled=false to turn the merge pass off without removing the config; pass null in the patch to leave the stored value untouched.

Show child attributes

sample_size

integer | null

Updated per-execution document cap. Takes effect on the next POST /v1/clusters/{id}/execute. Omit to leave the stored value untouched; set to an integer to change it. Hard max is 100,000 to keep O(N²) algorithms within RAM bounds.

Required range: x <= 100000

algorithm_params

Algorithm Params · object

Updated algorithm parameters (e.g. min_cluster_size, min_samples for HDBSCAN). Takes effect on the next POST /v1/clusters/{id}/execute.

Response

Successful Response

Cluster metadata stored in MongoDB.

collection_ids

string[] | null

Collections to cluster together

Minimum array length: 1

cluster_name

string | null

Optional human-friendly name for the clustering job

cluster_type

enum<string>

default:vector

Vector or attribute clustering

Available options:

vector,

attribute

vector_config

VectorBasedConfig · object

Required when cluster_type is 'vector'

Show child attributes

Example:

{
  "algorithm_params": { "min_cluster_size": 10, "min_samples": 5 },
  "clustering_method": "hdbscan",
  "description": "HDBSCAN clustering with multimodal embeddings",
  "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
  "sample_size": 1000
}

attribute_config

AttributeBasedConfig · object

Required when cluster_type is 'attribute'

Show child attributes

Example:

{
  "attributes": ["category"],
  "description": "Simple category clustering",
  "hierarchical_grouping": false
}

filters

LogicalOperator · object

Optional filters to pre-filter documents before clustering (same format as list documents). Applied during Qdrant scroll before parquet export. Useful for clustering subsets like: status='active', category='electronics', etc.

Show child attributes

llm_labeling

LLMLabeling · object

Optional configuration for LLM-based cluster labeling. When provided with enabled=True, clusters will have semantic labels generated by LLM instead of generic labels like 'Cluster 0'. When not provided or enabled=False, uses fallback labels.

Show child attributes

Example:

{
  "description": "Text-only labeling with multiple fields",
  "enabled": true,
  "include_keywords": true,
  "include_summary": true,
  "labeling_inputs": {
    "input_mappings": [
      {
        "input_key": "title",
        "path": "title",
        "source_type": "payload"
      },
      {
        "input_key": "description",
        "path": "description",
        "source_type": "payload"
      },
      {
        "input_key": "text",
        "path": "text",
        "source_type": "payload"
      }
    ]
  },
  "model_name": "gpt-4o-mini-2024-07-18",
  "provider": "openai"
}

enrich_source_collection

boolean

default:false

If True, cluster results are written back to source collection(s) in-place instead of creating new output collections. Documents will be enriched with cluster_id, cluster_label, distance_to_centroid, and optionally other metadata. Similar to taxonomy enrichment pattern.

source_enrichment_config

SourceEnrichmentConfig · object

Configuration for source collection enrichment (only used if enrich_source_collection=True). Controls which fields are added to source documents and field naming conventions.

Show child attributes

Example:

{
  "field_mappings": [
    {
      "source_field": "cluster_id",
      "target_field": "category_id"
    },
    {
      "source_field": "cluster_label",
      "target_field": "category_name"
    },
    {
      "source_field": "distance_to_centroid",
      "target_field": "category_confidence"
    }
  ]
}

auto_execute_on_batch

boolean

default:false

Automatically execute this cluster whenever a batch completes on any of its input collections. When True, a ClusterApplicationConfig entry is added to each input collection's cluster_applications field at creation time. The cluster will then auto-trigger after each batch completion (subject to cooldown and document threshold). When False (default), the cluster must be executed manually via the API.

auto_execute_min_documents

integer | null

Minimum number of documents required before auto-executing cluster. Only used when auto_execute_on_batch=True. If the collection has fewer documents than this threshold, clustering is skipped.

auto_execute_cooldown_seconds

integer

default:3600

Minimum time (in seconds) between automatic cluster executions. Only used when auto_execute_on_batch=True. Default: 3600 (1 hour).

cluster_id

string

Unique cluster identifier

parquet_path

string | null

S3 path to parquet files with cluster data

members_key

string | null

S3 key to members.parquet (if saved)

num_clusters

integer | null

Number of clusters found

cluster_stats

ClusterStats · object

Clustering quality metrics

Show child attributes

status

enum<string>

default:PENDING

Clustering job status

Available options:

PENDING,

QUEUED,

IN_PROGRESS,

PROCESSING,

COMPLETED,

COMPLETED_WITH_ERRORS,

FAILED,

CANCELED,

INTERRUPTED,

UNKNOWN,

SKIPPED,

DRAFT,

ACTIVE,

ARCHIVED,

SUSPENDED

task_id

string | null

Associated task ID for clustering job

last_run_id

string | null

Run ID of the most recent successful clustering execution. Used to retrieve execution results.

created_at

string<date-time>

When the cluster was created

updated_at

string<date-time>

When the cluster was last updated

metadata

Metadata · object

Additional user-defined metadata for the cluster

Organization

Namespaces

Buckets

Feature Extractors

Batch Queue

Collections

Documents

Retrievers

Taxonomies

Clusters

Triggers

Alerts

Webhooks

Apps

Agent Sessions

Annotations

Templates

Manifest

Discovery

Analytics

Notifications

Tasks

Inference

Resource Search

Pricing

Partially Update Cluster

Headers

Path Parameters

Body

Response

Organization

Namespaces

Buckets

Feature Extractors

Batch Queue

Collections

Documents

Retrievers

Taxonomies

Clusters

Triggers

Alerts

Webhooks

Apps

Agent Sessions

Annotations

Templates

Manifest

Discovery

Analytics

Notifications

Tasks

Inference

Resource Search

Pricing

Documentation Index

Headers

Path Parameters

Body

Response