Trigger Collection Processing
Process data through a collection - works for both bucket-sourced and collection-sourced collections.
For bucket-sourced collections:
Discovers objects from source bucket(s), creates a batch, and submits for processing.
Use include_buckets to limit which source buckets to process from.
For collection-sourced collections:
Processes existing documents from upstream collection(s).
Use include_collections to limit which source collections to process from.
Filtering:
source_filters: Field-level filters using LogicalOperator format- Example:
{"AND": [{"field": "status", "operator": "eq", "value": "pending"}]} - For specific objects:
{"AND": [{"field": "object_id", "operator": "in", "value": ["obj_1", "obj_2"]}]}
Returns:
- batch_id: Track progress via GET /batches/
- task_id: Monitor via GET /tasks/
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Path Parameters
The ID or name of the collection to trigger
Body
Request to trigger (re)processing through a collection.
For bucket-sourced collections (tier 0):
Discovers objects from source bucket(s) and creates a batch for processing.
Use include_buckets to limit which source buckets to process from.
For collection-sourced collections (tier N):
Processes existing documents from upstream collection(s).
Use include_collections to limit which source collections to process from.
Use source_filters for field-level filtering on objects or documents.
Document Overwrite Behavior:
- If source bucket has
unique_keyconfigured: Documents are UPSERTED (overwrites existing) - If source bucket has NO
unique_key: New documents are CREATED (may cause duplicates)
To enable idempotent re-processing, configure unique_key on the source bucket.
Limit processing to objects from these specific buckets (IDs or names). Only applies to bucket-sourced collections. If not provided, all configured source buckets are used.
Limit processing to documents from these specific collections (IDs or names). Only applies to collection-sourced collections. If not provided, all configured source collections are used.
Limit processing to these specific object IDs. Only applies to bucket-sourced collections. This is a convenience shorthand — equivalent to using source_filters with {"AND": [{"field": "object_id", "operator": "in", "value": [...]}]}.
Field-level filters for objects (bucket-sourced) or documents (collection-sourced). Uses LogicalOperator format (AND/OR/NOT). Use this to filter by metadata fields, status, or any other object/document properties.
{
"AND": [
{
"field": "status",
"operator": "eq",
"value": "pending"
}
]
}How to handle sources already processed in prior batches. skip (default): skip sources already materialized in this collection. replace: delete existing documents for the re-processed sources and re-materialize them — this also clears the processed-objects resume ledger, so use it to recover a collection stuck with ledger entries but 0 materialized documents (the orphan/divergence state). force: process regardless, allowing duplicates.
skip, replace, force Response
Successful Response
Response after triggering collection processing.
Use batch_id or task_id to monitor progress via GET /v1/batches/{batch_id}
or GET /v1/tasks/{task_id}.
ID of the created batch for tracking progress.
Task ID for monitoring via GET /v1/tasks/{task_id}.
ID of the collection being processed.
Number of processing tiers in the DAG.
Human-readable status message.
Bucket IDs that objects were discovered from (bucket-sourced collections).
Collection IDs that documents were read from (collection-sourced collections).
Total number of objects included in the batch (bucket-sourced collections).
Total number of documents to process (collection-sourced collections).

