> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Create Objects in Batch

> This endpoint creates multiple new objects in the specified bucket as a batch.
    Each object must conform to the bucket's schema.

    **Processing**: By default, objects are created in DRAFT status and require
    batch submission for processing. Set `auto_process=true` to automatically
    create a processing batch and submit it (zero-touch workflow).

    **Partial Success**: This endpoint uses partial success - valid objects are created
    even if some fail validation. Failed objects are returned separately with error details,
    allowing you to fix and retry only the failed ones.

    **Response**: Returns both succeeded and failed objects. The batch succeeds (200 OK) as long
    as at least one object is created. Check the `failed` array for objects that need attention.


## OpenAPI

````yaml post /v1/buckets/{bucket_identifier}/objects/batch
openapi: 3.1.0
info:
  title: Mixpeek API
  description: >-
    This is the Mixpeek API, providing access to various endpoints for data
    processing and retrieval.
  termsOfService: https://mixpeek.com/terms
  contact:
    name: Mixpeek Support
    url: https://mixpeek.com/contact
    email: info@mixpeek.com
  version: '0.82'
servers:
  - url: https://api.mixpeek.com
    description: Production
security: []
paths:
  /v1/buckets/{bucket_identifier}/objects/batch:
    post:
      tags:
        - Bucket Objects
      summary: Create Objects in Batch
      description: >-
        This endpoint creates multiple new objects in the specified bucket as a
        batch.
            Each object must conform to the bucket's schema.

            **Processing**: By default, objects are created in DRAFT status and require
            batch submission for processing. Set `auto_process=true` to automatically
            create a processing batch and submit it (zero-touch workflow).

            **Partial Success**: This endpoint uses partial success - valid objects are created
            even if some fail validation. Failed objects are returned separately with error details,
            allowing you to fix and retry only the failed ones.

            **Response**: Returns both succeeded and failed objects. The batch succeeds (200 OK) as long
            as at least one object is created. Check the `failed` array for objects that need attention.
      operationId: create_objects_batch_v1_buckets__bucket_identifier__objects_batch_post
      parameters:
        - name: bucket_identifier
          in: path
          required: true
          schema:
            type: string
            description: The unique identifier of the bucket.
            title: Bucket Identifier
          description: The unique identifier of the bucket.
        - name: auto_process
          in: query
          required: false
          schema:
            type: boolean
            description: >-
              Automatically create a batch and submit it for processing. When
              true, all successfully created objects will be immediately queued
              for processing without requiring separate batch calls. Ideal for
              onboarding and bulk upload workflows.
            default: false
            title: Auto Process
          description: >-
            Automatically create a batch and submit it for processing. When
            true, all successfully created objects will be immediately queued
            for processing without requiring separate batch calls. Ideal for
            onboarding and bulk upload workflows.
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateObjectsBatchRequest'
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/CreateObjectsBatchResponse'
        '400':
          description: Bad Request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '401':
          description: Unauthorized
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '403':
          description: Forbidden
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '404':
          description: Not Found
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
        '500':
          description: Internal Server Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
components:
  schemas:
    CreateObjectsBatchRequest:
      properties:
        objects:
          items:
            $ref: '#/components/schemas/CreateObjectRequest'
          type: array
          maxItems: 100
          title: Objects
          description: List of objects to be created in this batch (max 100).
      type: object
      required:
        - objects
      title: CreateObjectsBatchRequest
      description: Request model for creating multiple bucket objects in a batch.
      examples:
        - objects:
            - blobs:
                - data:
                    num_pages: 5
                    title: Service Agreement 2024
                  key_prefix: /contract-2024/content.pdf
                  metadata:
                    author: John Doe
                    department: Legal
                  property: content
                  type: json
              key_prefix: /documents
              metadata:
                category: contracts
                status: draft
                year: 2024
    CreateObjectsBatchResponse:
      properties:
        succeeded:
          items:
            $ref: '#/components/schemas/ObjectResponse'
          type: array
          title: Succeeded
          description: List of successfully created objects
        failed:
          items:
            $ref: '#/components/schemas/FailedObjectError'
          type: array
          title: Failed
          description: List of objects that failed to create with error details
        total_requested:
          type: integer
          title: Total Requested
          description: Total number of objects in the batch request
        succeeded_count:
          type: integer
          title: Succeeded Count
          description: Number of objects successfully created
        failed_count:
          type: integer
          title: Failed Count
          description: Number of objects that failed
      type: object
      required:
        - succeeded
        - failed
        - total_requested
        - succeeded_count
        - failed_count
      title: CreateObjectsBatchResponse
      description: >-
        Response model for batch object creation with partial success support.


        This endpoint uses partial success: valid objects are created even if
        some fail.

        Failed objects are tracked separately so users can fix and retry them.
      examples:
        - failed:
            - error: 'Schema validation failed: Missing required field ''title'''
              error_type: ValidationError
              object_index: 15
          failed_count: 1
          succeeded:
            - bucket_id: bkt_9xy8z7
              object_id: obj_123abc
              status: DRAFT
          succeeded_count: 99
          total_requested: 100
    ErrorResponse:
      properties:
        success:
          type: boolean
          title: Success
          description: Always false for error responses
          default: false
        status:
          type: integer
          title: Status
          description: HTTP status code for this error
        error:
          $ref: '#/components/schemas/ErrorDetail'
          description: Error details payload
      type: object
      required:
        - status
        - error
      title: ErrorResponse
      description: Error response model.
      examples:
        - error:
            details:
              id: ns_123
              resource: namespace
            message: Namespace not found
            type: NotFoundError
          status: 404
          success: false
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    CreateObjectRequest:
      properties:
        key_prefix:
          anyOf:
            - type: string
            - type: 'null'
          title: Key Prefix
          description: >-
            Storage key/path prefix of the object, this will be used to retrieve
            the object from the storage. It's at the root of the object.
          example: /contract-2024
        blobs:
          items:
            $ref: '#/components/schemas/CreateBlobRequest'
          type: array
          title: Blobs
          description: List of blobs to be created in this object
          example:
            - data:
                num_pages: 5
                title: Service Agreement 2024
              key_prefix: /content.pdf
              metadata:
                author: John Doe
                department: Legal
              property: content
              type: PDF
        idempotency_key:
          anyOf:
            - type: string
              maxLength: 255
            - type: 'null'
          title: Idempotency Key
          description: >-
            Client-generated idempotency key for safe retries. If an object with
            the same idempotency_key already exists in this bucket, the existing
            object is returned instead of creating a duplicate. Use a UUID or
            deterministic hash per object.
        skip_duplicates:
          type: boolean
          title: Skip Duplicates
          description: >-
            Skip duplicate blobs, if a blob with the same hash already exists,
            it will be skipped.
          default: false
        canonicalize_source:
          type: boolean
          title: Canonicalize Source
          description: Mirror non-S3 sources into internal S3 and reference canonically.
          default: true
        force_remirror:
          type: boolean
          title: Force Remirror
          description: >-
            Force re-upload to S3 even if a blob with identical content already
            exists.
          default: false
      additionalProperties: true
      type: object
      title: CreateObjectRequest
      description: >-
        Request model for creating a bucket object.


        Objects can be created with blobs from two sources:

        1. Direct data (URLs, base64) - Use CreateBlobRequest.data field

        2. Upload references - Use CreateBlobRequest.upload_id field (from POST
        /buckets/{id}/uploads)


        Upload Reference Workflow:
            For large files or client-side uploads, use the presigned URL workflow:
            1. POST /buckets/{id}/uploads → Returns {upload_id, presigned_url}
            2. User uploads file to presigned_url (client-side)
            3. POST /uploads/{upload_id}/confirm → Validates upload
            4. POST /buckets/{id}/objects with upload_id in blobs (this endpoint)

        Use Cases:
            - Single blob with direct data (simple)
            - Multiple blobs from presigned uploads (recommended for large files)
            - Mix of direct data and upload references
            - Combine multiple uploads into one object

        See Also:
            - CreateBlobRequest for blob field documentation
            - POST /buckets/{id}/uploads for presigned URL generation
      example:
        blobs:
          - data:
              num_pages: 5
              title: Service Agreement 2024
            key_prefix: /contract-2024/content.pdf
            metadata:
              author: John Doe
              department: Legal
            property: content
            type: json
          - data:
              filename: https://example.com/images/smartphone-x1.jpg
              mime_type: image/jpeg
            key_prefix: /contract-2024/thumbnail.jpg
            metadata:
              height: 300
              width: 200
            property: thumbnail
            type: image
        key_prefix: /documents
        metadata:
          category: contracts
          status: draft
          year: 2024
    ObjectResponse:
      properties:
        object_id:
          type: string
          title: Object Id
          description: Unique identifier for the object
        bucket_id:
          type: string
          title: Bucket Id
          description: ID of the bucket this object belongs to
        key_prefix:
          anyOf:
            - type: string
            - type: 'null'
          title: Key Prefix
          description: >-
            Storage key/path of the object, this will be used to retrieve the
            object from the storage. It is similar to a file path. If not
            provided, it will be placed in the root of the bucket.
        blobs:
          items:
            $ref: '#/components/schemas/BlobModel'
          type: array
          title: Blobs
          description: List of blobs contained in this object
        source_details:
          items:
            $ref: '#/components/schemas/SourceDetails'
          type: array
          title: Source Details
          description: >-
            Lineage/source details for this object; used for downstream
            references.
        status:
          $ref: '#/components/schemas/TaskStatusEnum'
          description: The current status of the object.
          default: DRAFT
        error:
          anyOf:
            - type: string
            - type: 'null'
          title: Error
          description: The error message if the object failed to process.
          examples:
            - 'Failed to process object: Object not found'
        created_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Created At
          description: >-
            Timestamp when the object was created. Automatically populated by
            the system.
        updated_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Updated At
          description: >-
            Timestamp when the object was last updated. Automatically populated
            by the system.
        document_count:
          anyOf:
            - type: integer
            - type: 'null'
          title: Document Count
          description: >-
            Number of documents produced from this object across all
            collections. Populated on GET requests. Null on list responses
            (expensive query). Use this to check if an object has already been
            processed.
        consistency:
          anyOf:
            - $ref: '#/components/schemas/WriteConsistency'
            - type: 'null'
          description: >-
            When and how this object becomes a searchable document. Set on write
            (create) responses: managed ingestion is visible only after a
            collection batch processes the object — poll until document_count >
            0.
      additionalProperties: true
      type: object
      required:
        - bucket_id
      title: ObjectResponse
      description: Response model for bucket objects.
      examples:
        - blobs:
            - blob_id: blob_1
              data:
                num_pages: 5
                title: Service Agreement 2024
              key_prefix: /contract-2024/content.pdf
              metadata:
                author: John Doe
                department: Legal
              property: content
              type: PDF
          bucket_id: bkt_9xy8z7
          content_hash: 28a9f5e8...
          created_at: '2024-10-21T10:30:00Z'
          key_prefix: /contract-2024
          metadata:
            category: contracts
            year: 2024
          object_id: obj_123abc456def
          skip_duplicates: false
          status: DRAFT
          updated_at: '2024-10-21T10:30:00Z'
    FailedObjectError:
      properties:
        object_index:
          type: integer
          title: Object Index
          description: 0-based index of the failed object in the batch request
        error:
          type: string
          title: Error
          description: Error message describing why the object failed
        error_type:
          type: string
          title: Error Type
          description: Type of error (e.g., 'ValidationError', 'URLValidationError')
      type: object
      required:
        - object_index
        - error
        - error_type
      title: FailedObjectError
      description: Error details for a failed object in a batch.
      examples:
        - error: 'Schema validation failed: Missing required field ''title'''
          error_type: ValidationError
          object_index: 15
        - error: 'URL validation failed: URL returned status 404'
          error_type: URLValidationError
          object_index: 42
    ErrorDetail:
      properties:
        message:
          type: string
          title: Message
          description: Human-readable error message
        type:
          type: string
          title: Type
          description: Stable error type identifier (machine-readable)
        code:
          anyOf:
            - type: string
            - type: 'null'
          title: Code
          description: >-
            Fine-grained error code for programmatic handling (e.g.,
            namespace_name_taken, feature_extractor_not_found). Present only
            when consumers may need to branch on a specific error condition.
        details:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Details
          description: >-
            Optional structured details to help debugging (validation errors,
            IDs, etc.)
      type: object
      required:
        - message
        - type
      title: ErrorDetail
      description: Error detail model.
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
    CreateBlobRequest:
      properties:
        property:
          type: string
          maxLength: 100
          minLength: 1
          title: Property
          description: >-
            REQUIRED. Property name from the bucket schema that this blob
            belongs to. Must match a field defined in the bucket's schema. Used
            to validate blob type compatibility and determine storage path.
            Common values: 'video', 'thumbnail', 'transcript', 'document',
            'image'
          examples:
            - video
            - thumbnail
            - transcript
            - image
            - document
        key_prefix:
          anyOf:
            - type: string
            - type: 'null'
          title: Key Prefix
          description: >-
            OPTIONAL. Storage path prefix for organizing blobs within the
            bucket. If not provided, uses default bucket organization. Use for:
            grouping blobs by campaign, date, category, etc. Example:
            'campaigns/summer_2025' or 'products/electronics'
          examples:
            - campaigns/summer_2025
            - products/electronics
            - 2025/Q4
        type:
          $ref: '#/components/schemas/BucketSchemaFieldType'
          description: >-
            REQUIRED. The schema field type for this blob. Must match the bucket
            schema definition for the property. Determines validation rules and
            processing pipeline. Common types: IMAGE, VIDEO, AUDIO, PDF,
            DOCUMENT, TEXT
          examples:
            - VIDEO
            - IMAGE
            - PDF
            - AUDIO
            - TEXT
            - DOCUMENT
        data:
          anyOf:
            - type: string
              maxLength: 2083
              minLength: 1
              format: uri
            - type: string
            - type: integer
            - type: number
            - type: boolean
            - additionalProperties: true
              type: object
            - items: {}
              type: array
            - type: 'null'
          title: Data
          description: >-
            EITHER data OR upload_id must be provided (mutually exclusive). 


            File data in one of several INTERCHANGEABLE formats: 


            **Format 1: URL String (HTTP/HTTPS/S3)** - Direct URL to file on the
            web or in S3 - Examples: 'https://example.com/video.mp4',
            's3://bucket/key' - Use for: Public files, existing S3 objects,
            pre-signed URLs - File is downloaded and uploaded to internal S3 (if
            canonicalize_source=True) 


            **Format 2: Data URI String (base64)** - Self-contained base64 data
            with MIME type - Format: 'data:<mime_type>;base64,<encoded_data>' -
            Example: 'data:image/jpeg;base64,/9j/4AAQSkZJRg...' - Use for: Small
            files (<5MB), mobile uploads, inline test data - MIME type
            automatically extracted from URI - Data is decoded, validated, and
            uploaded to S3 automatically 


            **Format 3: Base64 Dictionary** - Structured format with explicit
            metadata - Required keys: 'base64' (encoded data) - Optional keys:
            'mime_type', 'filename' - Example: {'base64': '/9j/4AAQ...',
            'mime_type': 'image/jpeg', 'filename': 'photo.jpg'} - Use for: When
            you need explicit MIME type control - Data is decoded, validated,
            and uploaded to S3 automatically 


            **Format 4: URL Dictionary** - Structured format for URL references
            - Required keys: 'url' - Example: {'url':
            'https://example.com/file.jpg'} - Use for: Consistency with other
            dict formats 


            **Processing:** All formats are converted to internal S3 URLs before
            storage. The engine always receives S3 URLs regardless of input
            format. 


            **Size Limits (Base64 only):** Base64 data: 5MB (free), 10MB (pro),
            50MB (enterprise). URLs: No limit (downloaded on-demand). For files
            exceeding limits, use presigned upload workflow: POST
            /buckets/{id}/uploads 


            **Validation:** - Base64: Encoding validated, MIME type detected,
            size checked - URLs: Accessibility verified, content-type validated
            - All: Schema type compatibility enforced
          examples:
            - https://example.com/video.mp4
            - s3://my-bucket/path/to/image.jpg
            - >-
              data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==
            - base64: >-
                iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==
              mime_type: image/png
            - base64: /9j/4AAQSkZJRg...
              filename: photo.jpg
              mime_type: image/jpeg
            - url: https://cdn.example.com/files/document.pdf
        upload_id:
          anyOf:
            - type: string
              pattern: ^upl_[a-zA-Z0-9_-]+$
            - type: 'null'
          title: Upload Id
          description: >-
            EITHER upload_id OR data must be provided. Reference to an existing
            upload from the presigned URL workflow. 


            ⚠️  PRESIGNED URLS: Use existing POST /buckets/{id}/uploads
            endpoint! It already handles presigned URL generation, upload
            tracking, and validation. DO NOT create a new /presigned-upload
            endpoint - it's redundant. 


            Workflow: 1. POST /buckets/{id}/uploads → {upload_id, presigned_url}
            2. User uploads file to presigned_url 3. POST
            /uploads/{upload_id}/confirm → Validates upload 4. Use upload_id
            here to reference the uploaded file 


            The upload must be in CONFIRMED or ACTIVE status. Format: 'upl_'
            prefix followed by alphanumeric characters. 


            Use Cases: - Combine multiple uploads into one object - Upload files
            in parallel, create object later - Reuse same upload across multiple
            objects 


            See: api/buckets/uploads/ for the complete upload system
          examples:
            - upl_abc123def456
            - upl_xyz789
        metadata:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Metadata
          description: >-
            Metadata for the blob, this will only be applied to the documents
            that use this blob
        canonicalize_source:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Canonicalize Source
          description: >-
            If set, override object-level default to control source
            canonicalization for this blob.
        force_remirror:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Force Remirror
          description: >-
            If set, override object-level default to force re-upload even if an
            identical blob exists.
      type: object
      required:
        - property
        - type
      title: CreateBlobRequest
      description: >-
        Request model for creating a new blob.


        ⚠️  IMPORTANT: For presigned URL uploads, use the existing
        /buckets/{id}/uploads system!
            DO NOT create a new presigned upload endpoint - one already exists.

        Supports two modes:


        Mode 1: Direct Data Upload
            - Provide 'data' field with URL or base64 content
            - File is processed immediately during object creation
            - Use for: Small files, public URLs, inline data

        Mode 2: Upload Reference (Recommended for large files)
            - First: POST /buckets/{id}/uploads → Returns presigned_url + upload_id
            - User uploads file directly to S3 via presigned_url
            - Then: POST /uploads/{upload_id}/confirm → Validates upload
            - Finally: Reference upload_id in this blob request
            - Use for: Large files, client-side uploads, multi-blob objects

        Why upload_id?
            - Combine multiple uploads into one object
            - Upload files in parallel, create object later
            - Reuse uploads across multiple objects
            - Better UX: upload progress, retry logic, validation

        Related Endpoints:
            - POST /buckets/{id}/uploads - Generate presigned URLs (EXISTING SYSTEM)
            - POST /uploads/{id}/confirm - Confirm upload completed
            - See: api/buckets/uploads/services.py for full upload workflow

        Examples:
            # Direct data (simple)
            {
              "property": "thumbnail",
              "type": "IMAGE",
              "data": "https://example.com/image.jpg"
            }

            # Upload reference (recommended)
            {
              "property": "video",
              "type": "VIDEO",
              "upload_id": "upl_abc123"  # From /uploads endpoint
            }

            # Multiple uploads → one object
            {
              "blobs": [
                {"property": "video", "upload_id": "upl_video123"},
                {"property": "thumbnail", "upload_id": "upl_thumb456"},
                {"property": "transcript", "upload_id": "upl_trans789"}
              ]
            }
      examples:
        - data: https://example.com/image.jpg
          description: Direct data upload - Simple image from URL
          metadata:
            alt_text: Product thumbnail
          property: thumbnail
          type: IMAGE
        - data: >-
            data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/wAARCAABAAEDASIAAhEBAxEB/8QAFQABAQAAAAAAAAAAAAAAAAAAAAv/xAAUEQEAAAAAAAAAAAAAAAAAAAAA/9oADAMBAAIRAxEAPwCwAA//2Q==
          description: Data URI upload - Mobile camera photo
          metadata:
            device: iPhone 13
            location: 'Store #42'
          property: photo
          type: IMAGE
        - data:
            base64: >-
              iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==
            filename: sales-chart-2025-11.png
            mime_type: image/png
          description: Base64 dict upload - Programmatically generated image
          metadata:
            chart_type: line
            generated_at: '2025-11-08T10:30:00Z'
          property: chart
          type: IMAGE
        - description: Upload reference - Video from presigned upload
          metadata:
            duration_seconds: 120
          property: video
          type: VIDEO
          upload_id: upl_abc123def456
        - canonicalize_source: false
          data: s3://my-bucket/documents/contract-2025.pdf
          description: S3 reference - Existing object
          metadata:
            department: Legal
            year: 2025
          property: archive
          type: PDF
        - blobs:
            - property: video
              type: VIDEO
              upload_id: upl_video123
            - data: data:image/png;base64,iVBORw0KG...
              property: thumbnail
              type: IMAGE
            - property: transcript
              type: TEXT
              upload_id: upl_trans789
          description: Multiple uploads in one object - Complete media package
          note: Use in array for multi-file objects
    BlobModel:
      properties:
        blob_id:
          type: string
          title: Blob Id
          description: Unique identifier for the blob
          examples:
            - blob_a1b2c3d4e5f6
        property:
          type: string
          title: Property
          description: Property name of the blob
          examples:
            - video
            - thumbnail
            - content
        key_prefix:
          anyOf:
            - type: string
            - type: 'null'
          title: Key Prefix
          description: >-
            Storage key/path of the blob, this will be used to retrieve the blob
            from the storage. It is similar to a file path. If not provided, it
            will be placed in the root of the bucket.
          examples:
            - /videos/video.mp4
            - /thumbnails/thumb.jpg
        type:
          $ref: '#/components/schemas/BucketSchemaFieldType'
          description: >-
            The schema field type this blob corresponds to (e.g., IMAGE, PDF,
            DOCUMENT)
          examples:
            - video
            - image
            - pdf
            - text
        properties:
          additionalProperties: true
          type: object
          title: Properties
          description: >-
            All blob data and metadata unified (formerly separate 'data' and
            'metadata' fields). Contains URLs, dimensions, metadata, and any
            other blob-specific information.
          examples:
            - author: John Doe
              duration: 120
              resolution: 1920x1080
              tags:
                - product
                - demo
              url: s3://bucket/video.mp4
        presigned_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Presigned Url
          description: >-
            Canonical top-level presigned URL for this blob. Matches the shape
            used by `document_blobs[].presigned_url` on document responses.
            Populated by the API when `?return_presigned_urls=true`. Also
            mirrored at `properties.presigned_url` for backward compatibility —
            prefer this top-level field; the nested path will be removed in a
            future release.
        details:
          $ref: '#/components/schemas/BlobDetails'
          description: System-generated file details (filename, size, hash, etc.)
      type: object
      required:
        - property
        - type
      title: BlobModel
      description: >-
        Model for a blob within a bucket object.


        Blobs store file references with a flat properties structure.

        All blob-specific data (formerly in separate 'data' and 'metadata'
        fields)

        is now unified in a single 'properties' dictionary.


        Example:
            {
                "blob_id": "blob_xyz123",
                "property": "video",
                "type": "video",
                "properties": {
                    "url": "s3://bucket/video.mp4",
                    "duration": 120,
                    "resolution": "1920x1080",
                    "author": "John Doe"  # All data unified here
                }
            }
    SourceDetails:
      properties:
        type:
          $ref: '#/components/schemas/SourceType'
          description: Immediate origin type from which this entity was derived.
        source_id:
          type: string
          title: Source Id
          description: >-
            Identifier of the immediate source entity (e.g., bucket_id,
            collection_id, taxonomy_id).
      type: object
      required:
        - type
        - source_id
      title: SourceDetails
      description: >-
        Source details for any document/point.


        Keep this intentionally minimal so specialized models (e.g.,
        DocumentSourceDetails)

        can extend it with domain-specific fields.
    TaskStatusEnum:
      type: string
      enum:
        - PENDING
        - QUEUED
        - IN_PROGRESS
        - PROCESSING
        - COMPLETED
        - COMPLETED_WITH_ERRORS
        - FAILED
        - CANCELED
        - INTERRUPTED
        - UNKNOWN
        - SKIPPED
        - DRAFT
        - ACTIVE
        - ARCHIVED
        - SUSPENDED
      title: TaskStatusEnum
      description: |-
        Enumeration of task statuses for tracking asynchronous operations.

        Task statuses indicate the current state of asynchronous operations like
        batch processing, object ingestion, clustering, and taxonomy execution.

        Status Categories:
            Operation Statuses: Track progress of async operations
            Lifecycle Statuses: Track entity state (buckets, collections, namespaces)

        Values:
            PENDING: Task is queued but has not started processing yet
            IN_PROGRESS: Task is currently being executed
            PROCESSING: Task is actively processing data (similar to IN_PROGRESS)
            COMPLETED: Task finished successfully with no errors
            COMPLETED_WITH_ERRORS: Task finished but some items failed (partial success)
            FAILED: Task encountered an error and could not complete
            CANCELED: Task was manually canceled by a user or system
            UNKNOWN: Task status could not be determined
            SKIPPED: Task was intentionally skipped
            DRAFT: Task is in draft state and not yet submitted

            ACTIVE: Entity is active and operational (for buckets, collections, etc.)
            ARCHIVED: Entity has been archived
            SUSPENDED: Entity has been temporarily suspended

        Terminal Statuses:
            COMPLETED, COMPLETED_WITH_ERRORS, FAILED, CANCELED are terminal statuses.
            Once a task reaches these states, it will not transition to another state.

        Partial Success Handling:
            COMPLETED_WITH_ERRORS indicates that the operation completed but some
            documents/items failed. The task result includes:
            - List of successful items
            - List of failed items with error details
            - Success rate percentage
            This allows clients to handle partial success scenarios appropriately.

        Polling Guidance:
            - Poll tasks in PENDING, QUEUED, IN_PROGRESS, or PROCESSING states
            - Stop polling when task reaches COMPLETED, COMPLETED_WITH_ERRORS, FAILED, or CANCELED
            - Use exponential backoff (1s → 30s) when polling
    WriteConsistency:
      properties:
        retriever_visible:
          type: string
          title: Retriever Visible
          description: >-
            Visibility model: 'eventual' (BYOV direct upsert — indexed within
            seconds) or 'after_processing' (managed ingestion — visible after a
            collection batch processes the object).
        recommended_header:
          anyOf:
            - type: string
            - type: 'null'
          title: Recommended Header
          description: Header to send on retriever execute for read-your-writes (BYOV).
        write_token_available:
          type: boolean
          title: Write Token Available
          description: Whether a write_token was issued for read-your-writes.
          default: false
        expected_visible_within_ms:
          anyOf:
            - type: integer
            - type: 'null'
          title: Expected Visible Within Ms
          description: >-
            Typical upper bound for visibility (BYOV indexing). Null when
            visibility depends on asynchronous processing (managed ingestion).
        poll:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Poll
          description: >-
            How to poll for visibility when it depends on async processing:
            {endpoint, field, ready_when}.
        next_actions:
          items:
            additionalProperties: true
            type: object
          type: array
          title: Next Actions
          description: Actionable next steps to reach retriever visibility.
      type: object
      required:
        - retriever_visible
      title: WriteConsistency
      description: How and when a write becomes visible to retriever reads.
    BucketSchemaFieldType:
      type: string
      enum:
        - string
        - number
        - integer
        - float
        - boolean
        - object
        - array
        - date
        - datetime
        - text
        - image
        - audio
        - video
        - pdf
        - excel
      title: BucketSchemaFieldType
      description: >-
        Supported data types for bucket schema fields.


        Types fall into two categories:


        1. **Metadata Types** (JSON types):
           - Stored as object metadata
           - Standard JSON-compatible types
           - Not processed by extractors (unless explicitly mapped)
           - Examples: string, number, boolean, date

        2. **File Types** (blobs):
           - Stored as files/blobs
           - Processed by extractors
           - Require file content (URL or base64)
           - Examples: text, image, video, pdf

        **GIF Special Handling**:
            GIF files can be declared as either IMAGE or VIDEO type:

            - As IMAGE: GIF is embedded as a single static image (first frame)
            - As VIDEO: GIF is decomposed frame-by-frame with embeddings per frame

            The multimodal extractor detects GIFs via MIME type (image/gif) and routes
            them based on your schema declaration. Use VIDEO for animated GIFs where
            frame-level search is needed, IMAGE for static/thumbnail use cases.

        NOTE: For retriever input schemas that need to accept document
        references

        (e.g., "find similar documents"), use RetrieverInputSchemaFieldType
        instead,

        which includes all bucket types plus document_reference.
    BlobDetails:
      properties:
        filename:
          anyOf:
            - type: string
            - type: 'null'
          title: Filename
        size_bytes:
          anyOf:
            - type: integer
            - type: 'null'
          title: Size Bytes
        mime_type:
          anyOf:
            - type: string
            - type: 'null'
          title: Mime Type
        hash:
          anyOf:
            - type: string
            - type: 'null'
          title: Hash
      type: object
      title: BlobDetails
      description: >-
        File details for a bucket object, these are automatically generated by
        the system.
    SourceType:
      type: string
      enum:
        - bucket
        - collection
        - taxonomy
        - cluster
        - direct_upsert
        - none
      title: SourceType
      description: Source types for any document/point.

````