> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Tasks

> Track asynchronous jobs across Mixpeek

Tasks provide a uniform way to monitor long-running operations (batch processing, clustering, taxonomy materialization, namespace migrations, etc.). Every task exposes a status from the shared `TaskStatusEnum`.

## TaskStatusEnum

```
PENDING → PROCESSING → COMPLETED
            ↘           ↘
            FAILED      COMPLETED_WITH_ERRORS
```

Additional values include `IN_PROGRESS`, `CANCELED`, `SKIPPED`, `UNKNOWN`, `DRAFT`, `ACTIVE`, `ARCHIVED`, and `SUSPENDED`. All async resources in Mixpeek adopt this enum, so your polling logic works everywhere.

<Warning>
  **Terminal statuses are `COMPLETED`, `COMPLETED_WITH_ERRORS`, `FAILED`, and `CANCELED`.** A poller that waits only for `COMPLETED` will hang forever on a batch that finished with some failed items (`COMPLETED_WITH_ERRORS` — partial success). Always stop on any terminal status.
</Warning>

## Anatomy of a Task

```json theme={null}
{
  "task_id": "tsk_processing_123",
  "task_type": "api_buckets_batches_process",
  "status": "PROCESSING",
  "inputs": ["batch_xyz789"],
  "outputs": null,
  "additional_data": {
    "batch_id": "batch_xyz789",
    "bucket_id": "bkt_products",
    "job_id": "ray_job_123"
  },
  "error_message": null
}
```

* Cached in Redis for \~24 hours (fast lookup).
* Persisted in MongoDB for historical auditing.
* `additional_data` stores resource-specific details (e.g., Ray job IDs).

## Polling Strategy

<Steps>
  <Step title="Poll the task">
    Query `/v1/tasks/{task_id}` with exponential backoff (start at 1s, cap at 30s).
  </Step>

  <Step title="Handle 404 gracefully">
    After Redis TTL expires you may receive `404`; fall back to the underlying resource (batch, cluster, etc.).
  </Step>

  <Step title="Switch to resource polling">
    Use `/v1/buckets/{bucket_id}/batches/{batch_id}`, `/v1/clusters/{cluster_id}`, etc., for long-running operations.
  </Step>
</Steps>

Example hybrid poller:

```python theme={null}
while True:
    try:
        task = get_task(task_id)
    except NotFound:
        task = get_batch(bucket_id, batch_id)

    if task.status in ("COMPLETED", "COMPLETED_WITH_ERRORS"):
        break  # both are terminal; COMPLETED_WITH_ERRORS = partial success
    if task.status in ("FAILED", "CANCELED"):
        raise RuntimeError(task.error_message)
    time.sleep(delay)
    delay = min(delay * 1.5, 30)
```

## Webhooks & Notifications

* Engine emits webhook events (e.g., `collection.documents.written`) when tasks complete relevant work.
* Celery Beat dispatches those events to invalidate caches, update schemas, and notify external systems.
* Prefer webhooks for near-real-time updates instead of aggressive polling.

## Managing Tasks

* `GET /v1/tasks/{task_id}` – fetch the latest status.
* `POST /v1/tasks/list` – filter by type, status, namespace, or creation time.
* `POST /v1/tasks/{task_id}/kill` – request cancellation (supported for batches and clustering jobs using Celery’s `AbortableAsyncResult`).

## Best Practices

1. **Store task IDs** returned by submit endpoints.
2. **Use exponential backoff** to avoid hammering the API.
3. **Respect terminal states** (`COMPLETED`, `COMPLETED_WITH_ERRORS`, `FAILED`, `CANCELED`) and surface errors to operators.
4. **Leverage webhooks** for side-effects like cache invalidation or notifications.
5. **Instrument monitoring**—task history in MongoDB plus webhook logs provide a full audit trail.

Tasks keep the asynchronous parts of Mixpeek manageable—treat them as durable receipts for every long-running job.
