> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Submit Batch for Processing

> Submit a draft batch for asynchronous processing. The batch must be in 'DRAFT' status and contain objects.


## OpenAPI

````yaml post /v1/buckets/{bucket_identifier}/batches/{batch_id}/submit
openapi: 3.1.0
info:
  title: Mixpeek API
  description: >-
    This is the Mixpeek API, providing access to various endpoints for data
    processing and retrieval.
  termsOfService: https://mixpeek.com/terms
  contact:
    name: Mixpeek Support
    url: https://mixpeek.com/contact
    email: info@mixpeek.com
  version: '0.82'
servers:
  - url: https://api.mixpeek.com
    description: Production
security: []
paths:
  /v1/buckets/{bucket_identifier}/batches/{batch_id}/submit:
    post:
      tags:
        - Bucket Batches
      summary: Submit Batch for Processing
      description: >-
        Submit a draft batch for asynchronous processing. The batch must be in
        'DRAFT' status and contain objects.
      operationId: >-
        submit_batch_v1_buckets__bucket_identifier__batches__batch_id__submit_post
      parameters:
        - name: bucket_identifier
          in: path
          required: true
          schema:
            type: string
            description: The unique identifier of the bucket.
            title: Bucket Identifier
          description: The unique identifier of the bucket.
        - name: batch_id
          in: path
          required: true
          schema:
            type: string
            description: The unique identifier of the batch.
            title: Batch Id
          description: The unique identifier of the batch.
        - name: force
          in: query
          required: false
          schema:
            type: boolean
            description: >-
              Bypass the concurrent batch dedup check. Use when you
              intentionally want to reprocess the same bucket+collection
              combination while another batch is in flight.
            default: false
            title: Force
          description: >-
            Bypass the concurrent batch dedup check. Use when you intentionally
            want to reprocess the same bucket+collection combination while
            another batch is in flight.
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/SubmitBatchRequest'
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/TaskResponse'
        '400':
          description: Bad Request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '401':
          description: Unauthorized
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '403':
          description: Forbidden
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '404':
          description: Not Found
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
        '500':
          description: Internal Server Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
components:
  schemas:
    SubmitBatchRequest:
      properties:
        include_processing_history:
          type: boolean
          title: Include Processing History
          description: >-
            OPTIONAL (defaults to True). Controls whether processing operations
            are tracked in document internal_metadata.processing_history. When
            True: Each enrichment operation (taxonomy application, clustering,
            etc.) adds an audit trail entry. When False: Documents are enriched
            without processing history tracking, resulting in cleaner metadata.
            Use True for: Debugging, audit requirements, lineage tracking,
            understanding document transformations. Use False for: Production
            workloads where metadata size matters, simplified document
            structure. Processing history entries include: operation type,
            timestamp, and IDs of applied resources (taxonomies, clusters,
            etc.).
          default: true
          examples:
            - true
            - false
        collection_ids:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Collection Ids
          description: >-
            DEPRECATED submit-time compatibility field. Collection scope is
            resolved when the batch is created via
            CreateBatchRequest.collection_ids. If provided at submit time, it
            must match the batch's tier-0 scope; otherwise submission fails
            closed instead of silently processing unexpected collections.
          examples:
            - - col_text_only
            - null
        webhook_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Webhook Url
          description: >-
            OPTIONAL. URL to receive an HTTP POST notification when the batch
            reaches a terminal state (COMPLETED, FAILED, or CANCELED). The
            payload includes batch_id, status, failure_reason,
            documents_written, progress, and completed_at. The webhook is
            fire-and-forget: delivery failures are logged but do not affect
            batch processing. Must be an HTTPS or HTTP URL reachable from the
            server.
          examples:
            - https://example.com/webhooks/batch-complete
            - null
      type: object
      title: SubmitBatchRequest
      description: |-
        Request model for submitting a batch for processing.

        This model allows configuration of processing behavior for the batch,
        such as whether to track processing history in document metadata.

        Use Cases:
            - Submit batch with full audit trail (include_processing_history=True)
            - Submit batch without processing history for cleaner metadata (include_processing_history=False)
            - Default behavior includes processing history for debugging and lineage tracking

        Requirements:
            - include_processing_history: OPTIONAL, defaults to True
      examples:
        - description: Submit batch with processing history (default)
          include_processing_history: true
        - description: Submit batch without processing history
          include_processing_history: false
        - description: Submit batch with webhook notification
          webhook_url: https://example.com/webhooks/batch-complete
        - description: Submit batch with default settings (includes processing history)
    TaskResponse:
      properties:
        task_id:
          type: string
          title: Task Id
          description: >-
            Unique identifier for the task. REQUIRED. Used to poll task status
            via GET /v1/tasks/{task_id}. This ID is also stored on parent
            resources (batches, clusters, etc.) for cross-referencing. Format:
            UUID v4 or custom string identifier.
          examples:
            - task_abc123def456
            - 550e8400-e29b-41d4-a716-446655440000
        task_type:
          $ref: '#/components/schemas/TaskType'
          description: >-
            Type of operation this task represents. REQUIRED. Identifies the
            specific async operation being performed. Used for filtering and
            categorizing tasks. Common types: api_buckets_batches_process,
            engine_cluster_build, api_taxonomies_execute. See TaskType enum for
            complete list of supported operations.
          examples:
            - api_buckets_batches_process
            - engine_cluster_build
            - api_taxonomies_execute
        status:
          $ref: '#/components/schemas/TaskStatusEnum'
          description: >-
            Current status of the task. REQUIRED. Indicates the current state of
            the async operation. Terminal statuses (COMPLETED, FAILED, CANCELED)
            indicate the task has finished and will not change. Active statuses
            (PENDING, IN_PROGRESS, PROCESSING) indicate the task is still
            running and should be polled. Use this field to determine when to
            stop polling.
          examples:
            - PENDING
            - PROCESSING
            - COMPLETED
            - FAILED
        inputs:
          anyOf:
            - items:
                anyOf:
                  - type: string
                  - additionalProperties: true
                    type: object
              type: array
            - type: 'null'
          title: Inputs
          description: >-
            Input parameters or data used to start the task. OPTIONAL. May
            include IDs, configuration objects, or file references. Useful for
            debugging and understanding what data the task processed. Format:
            List of strings (IDs) or objects (configuration). Example:
            ['batch_id_123'] or [{'bucket_id': 'bkt_abc', 'config': {...}}]
          examples:
            - - batch_xyz789
            - - obj_123
              - obj_456
              - obj_789
            - - bucket_id: bkt_abc
                collection_ids:
                  - col_1
                  - col_2
        outputs:
          anyOf:
            - items:
                anyOf:
                  - type: string
                  - additionalProperties: true
                    type: object
              type: array
            - type: 'null'
          title: Outputs
          description: >-
            Output results produced by the task. OPTIONAL. Populated when task
            completes successfully. May include processed file IDs, result
            metrics, or status summaries. Check this field after task reaches
            COMPLETED status to get results. Format: List of strings (output
            IDs) or objects (result data).
          examples:
            - - document_123
              - document_456
            - - failed_count: 2
                processed_count: 100
                success_rate: 0.98
            - - cluster_id: cl_abc123
                num_clusters: 5
        additional_data:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Additional Data
          description: >-
            Additional metadata and context for the task. OPTIONAL. Contains job
            IDs, error details, progress info, and other task-specific
            metadata. 


            Common fields (all task types): - 'error': Error message if task
            failed - 'job_id': Ray job ID for engine tasks - 'from_mongodb':
            True if retrieved from MongoDB fallback (not Redis) 


            Batch-specific fields (task_type=api_buckets_batches_process): -
            'batch_id': Batch identifier (REQUIRED) - 'bucket_id': Source bucket
            identifier (REQUIRED) - 'namespace_id': Namespace identifier
            (REQUIRED) - 'current_tier': Currently processing tier number,
            0-indexed (OPTIONAL, None if not started) - 'total_tiers': Total
            number of tiers in the batch pipeline (REQUIRED) - 'collection_ids':
            Array of ALL collection IDs across all tiers (REQUIRED) -
            'object_count': Number of objects being processed (REQUIRED) -
            'sample_object_ids': First 5 object IDs for debugging/display
            (OPTIONAL) 


            Performance Note: Full object_ids array is NOT stored in task
            metadata to avoid bloating task documents (batches with 10k+ objects
            would add 200KB+ per task). For full object list, query the batch
            directly via GET /v1/buckets/{bucket_id}/batches/{batch_id}. 


            Note: For detailed per-tier status, use GET
            /v1/buckets/{bucket_id}/batches/{batch_id} to access the
            tier_tasks[] array which contains individual tier statuses,
            collection_ids, and timestamps for each tier.
          examples:
            - batch_id: btch_xyz789
              bucket_id: bkt_products
              collection_ids:
                - col_tier0
                - col_tier1
                - col_tier2
              current_tier: 1
              job_id: ray_job_123
              namespace_id: ns_abc123
              object_count: 10000
              sample_object_ids:
                - obj_001
                - obj_002
                - obj_003
                - obj_004
                - obj_005
              total_tiers: 3
            - error: 'Failed to process object: Invalid file format'
              job_id: '123'
            - cluster_id: cl_abc
              collection_ids:
                - col_1
              from_mongodb: true
        error:
          anyOf:
            - type: string
            - type: 'null'
          title: Error
          description: >-
            Flattened error message for convenient error handling. OPTIONAL.
            Automatically populated from additional_data['error'] when the task
            has FAILED status. This is a convenience field - the full error
            details are always available in additional_data['error']. Use this
            field for displaying errors to users or logging. Will be None if
            task has not failed or if no error details are available. Serialized
            as 'error' in API responses for backward compatibility.
          examples:
            - 'Failed to process batch: Object not found'
            - 'Invalid file format: Expected PDF, got PNG'
            - 'Clustering failed: Insufficient data points'
            - null
        queue_position:
          anyOf:
            - type: integer
            - type: 'null'
          title: Queue Position
          description: >-
            1-based position in the Ray processing waitlist. None if the batch
            was dispatched immediately (no queue). Position 1 means this batch
            will be processed next.
        estimated_wait_minutes:
          anyOf:
            - type: number
            - type: 'null'
          title: Estimated Wait Minutes
          description: >-
            Estimated minutes until this batch starts processing, based on queue
            position and average batch duration. None if the batch was
            dispatched immediately.
      type: object
      required:
        - task_id
        - task_type
        - status
      title: TaskResponse
      description: |-
        Task response model returned by the API.

        Extends TaskModel with additional convenience fields for API responses.
        This is the model returned when you GET /v1/tasks/{task_id}.

        Additional Fields:
            error_message: Convenience field that surfaces errors from additional_data
                          for easier error handling in client code.

        Inheritance:
            Inherits all fields and documentation from TaskModel, including:
            - task_id: Unique identifier
            - task_type: Operation type
            - status: Current status
            - inputs: Input parameters
            - outputs: Output results
            - additional_data: Metadata and context

        Storage Architecture:
            Same as TaskModel - stored in Redis (24hr TTL) with MongoDB fallback.

        Usage:
            This model is automatically returned by task API endpoints. You don't
            need to construct it manually - just call GET /v1/tasks/{task_id}.

        Error Handling:
            Check the error_message field for a user-friendly error string, or
            additional_data['error'] for the full error details.

        Example Response:
            {
                "task_id": "task_abc123",
                "task_type": "api_buckets_batches_process",
                "status": "FAILED",
                "inputs": ["batch_xyz"],
                "outputs": null,
                "additional_data": {
                    "error": "Failed to process batch: Object not found",
                    "batch_id": "batch_xyz"
                },
                "error_message": "Failed to process batch: Object not found"
            }
      examples:
        - additional_data:
            batch_id: btch_xyz789
            bucket_id: bkt_products
            collection_ids:
              - col_tier0
              - col_tier1
              - col_tier2
            current_tier: 1
            job_id: ray_job_123
            namespace_id: ns_abc123
            object_count: 10000
            sample_object_ids:
              - obj_001
              - obj_002
              - obj_003
              - obj_004
              - obj_005
            total_tiers: 3
          description: >-
            Multi-tier batch processing task in progress (tier 1 of 3) with 10k
            objects
          inputs:
            - batch_xyz789
          status: IN_PROGRESS
          task_id: 2d322a05-3178-4eca-aac6-b82b0a0313aa
          task_type: api_buckets_batches_process
        - additional_data:
            cluster_id: cl_abc123
            job_id: ray_job_456
          description: Completed clustering task with results
          inputs:
            - collection_ids:
                - col_products
              config:
                algorithm: kmeans
                k: 5
          outputs:
            - cluster_id: cl_abc123
              num_clusters: 5
              silhouette_score: 0.78
          status: COMPLETED
          task_id: task_cluster_789
          task_type: engine_cluster_build
        - additional_data:
            bucket_id: bkt_test
            error: 'Invalid file format: Expected PDF, got PNG'
            object_id: obj_123
          description: Failed object creation task with error
          inputs:
            - bucket_id: bkt_test
              object_id: obj_123
          status: FAILED
          task_id: task_failed_123
          task_type: api_buckets_objects_create
        - additional_data:
            batch_id: batch_old_123
            from_mongodb: true
            note: Retrieved from persistent storage after 24hr Redis expiry
          description: Task retrieved from MongoDB fallback (Redis expired)
          inputs:
            - batch_old_123
          outputs:
            - Processed 500 objects
          status: COMPLETED
          task_id: taYOUR_OLD_API_KEY
          task_type: api_buckets_batches_process
    ErrorResponse:
      properties:
        success:
          type: boolean
          title: Success
          description: Always false for error responses
          default: false
        status:
          type: integer
          title: Status
          description: HTTP status code for this error
        error:
          $ref: '#/components/schemas/ErrorDetail'
          description: Error details payload
      type: object
      required:
        - status
        - error
      title: ErrorResponse
      description: Error response model.
      examples:
        - error:
            details:
              id: ns_123
              resource: namespace
            message: Namespace not found
            type: NotFoundError
          status: 404
          success: false
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    TaskType:
      type: string
      enum:
        - api_namespaces_create
        - api_namespaces_delete
        - api_namespaces_snapshot_create
        - api_namespaces_snapshot_restore
        - api_namespaces_migrations_run
        - api_buckets_objects_create
        - api_buckets_delete
        - api_buckets_batches_process
        - api_buckets_batches_submit
        - api_buckets_uploads_create
        - api_buckets_uploads_confirm
        - api_buckets_uploads_batch_confirm
        - api_collections_documents_create
        - api_collections_extraction_artifacts
        - api_taxonomies_create
        - api_taxonomies_execute
        - api_taxonomies_materialize
        - api_evaluations_run
        - api_evaluations_dataset_create
        - api_retrievers_publish
        - api_collections_export
        - api_collections_trigger
        - engine_feature_extractor_run
        - engine_inference_run
        - engine_object_processing
        - engine_cluster_build
        - thumbnail
        - video_segment
        - audio_segment
        - converted_video
        - materialize
        - plugin_custom
        - model_custom
      title: TaskType
      description: >-
        Types of asynchronous tasks that can be performed in the system.


        Task types identify the specific operation being performed. This helps
        with

        tracking, debugging, and filtering tasks by operation type.


        Categories:
            API Tasks: User-initiated operations via API endpoints
            Engine Tasks: Background processing tasks
            Inference Tasks: Specialized inference operations

        API Task Types:
            API_NAMESPACES_CREATE: Creating a new namespace
            API_NAMESPACES_MIGRATIONS_RUN: Running a namespace migration
            API_BUCKETS_OBJECTS_CREATE: Creating objects in a bucket
            API_BUCKETS_DELETE: Deleting a bucket and its contents
            API_BUCKETS_BATCHES_PROCESS: Processing a batch of objects
            API_BUCKETS_BATCHES_SUBMIT: Submitting a batch for processing
            API_BUCKETS_UPLOADS_CREATE: Creating an upload session
            API_BUCKETS_UPLOADS_CONFIRM: Confirming an upload completion
            API_BUCKETS_UPLOADS_BATCH_CONFIRM: Confirming batch upload completion
            API_TAXONOMIES_CREATE: Creating a new taxonomy
            API_TAXONOMIES_EXECUTE: Executing taxonomy classification
            API_TAXONOMIES_MATERIALIZE: Materializing taxonomy results
            API_RETRIEVERS_PUBLISH: Publishing retriever assets (OG images, etc.)

        Engine Task Types:
            ENGINE_FEATURE_EXTRACTOR_RUN: Running feature extraction on data
            ENGINE_INFERENCE_RUN: Running inference operations
            ENGINE_OBJECT_PROCESSING: Processing object data
            ENGINE_CLUSTER_BUILD: Building clusters from data

        Inference Task Types:
            THUMBNAIL: Generating thumbnails
            MATERIALIZE: Materializing processed data

        Usage:
            Task types are automatically assigned when tasks are created. You can
            filter tasks by type when listing or searching for specific operations.
    TaskStatusEnum:
      type: string
      enum:
        - PENDING
        - QUEUED
        - IN_PROGRESS
        - PROCESSING
        - COMPLETED
        - COMPLETED_WITH_ERRORS
        - FAILED
        - CANCELED
        - INTERRUPTED
        - UNKNOWN
        - SKIPPED
        - DRAFT
        - ACTIVE
        - ARCHIVED
        - SUSPENDED
      title: TaskStatusEnum
      description: |-
        Enumeration of task statuses for tracking asynchronous operations.

        Task statuses indicate the current state of asynchronous operations like
        batch processing, object ingestion, clustering, and taxonomy execution.

        Status Categories:
            Operation Statuses: Track progress of async operations
            Lifecycle Statuses: Track entity state (buckets, collections, namespaces)

        Values:
            PENDING: Task is queued but has not started processing yet
            IN_PROGRESS: Task is currently being executed
            PROCESSING: Task is actively processing data (similar to IN_PROGRESS)
            COMPLETED: Task finished successfully with no errors
            COMPLETED_WITH_ERRORS: Task finished but some items failed (partial success)
            FAILED: Task encountered an error and could not complete
            CANCELED: Task was manually canceled by a user or system
            UNKNOWN: Task status could not be determined
            SKIPPED: Task was intentionally skipped
            DRAFT: Task is in draft state and not yet submitted

            ACTIVE: Entity is active and operational (for buckets, collections, etc.)
            ARCHIVED: Entity has been archived
            SUSPENDED: Entity has been temporarily suspended

        Terminal Statuses:
            COMPLETED, COMPLETED_WITH_ERRORS, FAILED, CANCELED are terminal statuses.
            Once a task reaches these states, it will not transition to another state.

        Partial Success Handling:
            COMPLETED_WITH_ERRORS indicates that the operation completed but some
            documents/items failed. The task result includes:
            - List of successful items
            - List of failed items with error details
            - Success rate percentage
            This allows clients to handle partial success scenarios appropriately.

        Polling Guidance:
            - Poll tasks in PENDING, QUEUED, IN_PROGRESS, or PROCESSING states
            - Stop polling when task reaches COMPLETED, COMPLETED_WITH_ERRORS, FAILED, or CANCELED
            - Use exponential backoff (1s → 30s) when polling
    ErrorDetail:
      properties:
        message:
          type: string
          title: Message
          description: Human-readable error message
        type:
          type: string
          title: Type
          description: Stable error type identifier (machine-readable)
        code:
          anyOf:
            - type: string
            - type: 'null'
          title: Code
          description: >-
            Fine-grained error code for programmatic handling (e.g.,
            namespace_name_taken, feature_extractor_not_found). Present only
            when consumers may need to branch on a specific error condition.
        details:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Details
          description: >-
            Optional structured details to help debugging (validation errors,
            IDs, etc.)
      type: object
      required:
        - message
        - type
      title: ErrorDetail
      description: Error detail model.
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````