> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Update Sync Configuration

> Update a sync configuration.



## OpenAPI

````yaml patch /v1/buckets/{bucket_id}/syncs/{sync_config_id}
openapi: 3.1.0
info:
  title: Mixpeek API
  description: >-
    This is the Mixpeek API, providing access to various endpoints for data
    processing and retrieval.
  termsOfService: https://mixpeek.com/terms
  contact:
    name: Mixpeek Support
    url: https://mixpeek.com/contact
    email: info@mixpeek.com
  version: '0.82'
servers:
  - url: https://api.mixpeek.com
    description: Production
security: []
paths:
  /v1/buckets/{bucket_id}/syncs/{sync_config_id}:
    patch:
      tags:
        - Bucket Syncs
      summary: Update Sync Configuration
      description: Update a sync configuration.
      operationId: >-
        update_sync_configuration_v1_buckets__bucket_id__syncs__sync_config_id__patch
      parameters:
        - name: bucket_id
          in: path
          required: true
          schema:
            type: string
            title: Bucket Id
        - name: sync_config_id
          in: path
          required: true
          schema:
            type: string
            title: Sync Config Id
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/SyncUpdateRequest'
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/SyncConfigurationModel'
        '400':
          description: Bad Request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '401':
          description: Unauthorized
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '403':
          description: Forbidden
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '404':
          description: Not Found
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
        '500':
          description: Internal Server Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
components:
  schemas:
    SyncUpdateRequest:
      properties:
        description:
          anyOf:
            - type: string
            - type: 'null'
          title: Description
          description: >-
            Optional human-readable description of the sync configuration. NOT
            REQUIRED. Used for documentation and UI display. Maximum 500
            characters.
          examples:
            - Daily video ingestion from production bucket
        metadata:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Metadata
          description: >-
            Optional custom metadata to replace existing metadata. NOT REQUIRED.
            Completely replaces existing metadata (not merged). Use for tagging,
            categorization, or custom attributes. Maximum 50 keys, values must
            be JSON-serializable.
          examples:
            - environment: production
              project: video-pipeline
            - last_updated: '2025-11-01'
              owner: data-team
        status:
          anyOf:
            - $ref: '#/components/schemas/TaskStatusEnum'
            - type: 'null'
          description: >-
            Optional status to set for the sync configuration. NOT REQUIRED.
            Valid values: 'pending', 'processing', 'completed', 'failed',
            'paused'. Typically managed automatically but can be manually
            overridden. Use pause/resume endpoints instead for active control.
          examples:
            - pending
            - processing
            - completed
        is_active:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Is Active
          description: >-
            Optional flag to enable or disable the sync configuration. NOT
            REQUIRED. When False, sync will not process new files. Prefer using
            the /pause and /resume endpoints for clarity. Changes take effect
            immediately.
          examples:
            - true
            - false
        polling_interval_seconds:
          anyOf:
            - type: integer
              maximum: 900
              minimum: 30
            - type: 'null'
          title: Polling Interval Seconds
          description: >-
            Optional new polling interval in seconds. NOT REQUIRED. Must be
            between 30 and 900 seconds if provided. Only applies to 'continuous'
            and 'scheduled' sync modes. Lower values increase responsiveness but
            API usage.
          examples:
            - 60
            - 300
            - 600
        batch_size:
          anyOf:
            - type: integer
              maximum: 100
              minimum: 1
            - type: 'null'
          title: Batch Size
          description: >-
            Optional new batch size for file processing. NOT REQUIRED. Must be
            between 1 and 100 if provided. Larger batches improve throughput but
            use more memory. Changes apply to subsequent batches only.
          examples:
            - 10
            - 50
            - 100
        schema_mapping:
          anyOf:
            - $ref: '#/components/schemas/SchemaMapping-Input'
            - type: 'null'
          description: >-
            Optional schema mapping to replace existing mapping. NOT REQUIRED.
            Completely replaces existing schema_mapping (not merged). Defines
            how source data maps to bucket schema fields and blobs. See
            SyncCreateRequest.schema_mapping for detailed documentation.
        skip_batch_submission:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Skip Batch Submission
          description: >-
            If True, sync objects to the bucket without creating or submitting
            batches for collection processing. Objects are created in the bucket
            but no tier processing is triggered. NOT REQUIRED. When omitted,
            existing value is preserved.
          examples:
            - false
            - true
        max_objects_per_run:
          anyOf:
            - type: integer
              minimum: 1
            - type: 'null'
          title: Max Objects Per Run
          description: >-
            Hard cap on objects processed per sync run. NOT REQUIRED. When
            omitted, existing value is preserved. Use to limit runaway syncs or
            control ingestion volume.
          examples:
            - 100
            - 50000
            - 100000
        reconcile:
          anyOf:
            - $ref: '#/components/schemas/ReconcileSettings'
            - type: 'null'
          description: >-
            Controls how Mixpeek reconciles objects when the source changes.
            on_delete: cascade-delete when source asset is removed. on_update:
            propagate metadata changes and re-process. on_filter_drift: remove
            objects that no longer match filters. NOT REQUIRED. When omitted,
            existing value is preserved.
          examples:
            - on_delete: true
              on_filter_drift: true
              on_update: true
            - on_delete: true
              on_filter_drift: false
              on_update: false
        provider_filters:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Provider Filters
          description: >-
            Provider-specific pre-filters pushed down to the storage API call.
            NOT REQUIRED. Completely replaces existing provider_filters (not
            merged). Each provider defines its own filter schema. Examples: -
            Iconik: {'collection_ids': [...], 'media_type': 'video,image',
            'path_patterns': ['*/Footage/*']} - Google Drive:
            {'shared_drive_id': '0AH-Xabc123'} - S3: {'prefix': 'videos/'}
          examples:
            - collection_ids:
                - col_abc
                - col_def
              media_type: video,image
      type: object
      title: SyncUpdateRequest
      description: >-
        Request to update an existing sync configuration.


        Allows partial updates to sync settings without recreating the
        configuration.

        All fields are optional - only provided fields will be updated.


        Use Cases:
            - Pause/resume syncs by toggling is_active
            - Adjust polling intervals based on activity patterns
            - Update batch sizes for performance tuning
            - Add metadata tags for organization

        Requirements:
            - All fields are OPTIONAL
            - At least one field should be provided for the update
            - Changes take effect on the next sync cycle
      examples:
        - description: Pause sync temporarily
          is_active: false
        - description: Adjust polling frequency
          polling_interval_seconds: 600
        - batch_size: 75
          description: Update batch size and metadata
          metadata:
            last_tuned: '2025-11-01'
            optimized: true
        - batch_size: 100
          description: Production video sync - optimized settings
          is_active: true
          metadata:
            priority: high
            project: video-ai
          polling_interval_seconds: 120
        - description: Enable sync-only mode (no batch processing)
          skip_batch_submission: true
        - description: Update Iconik collection filters
          provider_filters:
            collection_ids:
              - 8f833e78-92fd-11ef-99d4-36417797d291
            media_type: video,image
    SyncConfigurationModel:
      properties:
        sync_config_id:
          type: string
          title: Sync Config Id
          description: Unique identifier for the sync configuration.
        bucket_id:
          type: string
          title: Bucket Id
          description: Target bucket identifier (e.g. 'bkt_marketing_assets').
        connection_id:
          type: string
          title: Connection Id
          description: Storage connection identifier (e.g. 'conn_abc123').
        internal_id:
          type: string
          title: Internal Id
          description: Organization internal identifier (multi-tenancy scope).
        namespace_id:
          type: string
          title: Namespace Id
          description: Namespace identifier owning the bucket.
        source_path:
          type: string
          title: Source Path
          description: >-
            Source path in the external storage provider. Format varies by
            provider: s3/tigris='bucket/prefix', google_drive='folder_id',
            sharepoint='/sites/Name/Documents', snowflake='DB.SCHEMA.TABLE'.
        file_filters:
          anyOf:
            - $ref: '#/components/schemas/FileFilters'
            - type: 'null'
          description: Optional filter rules limiting which files are synced.
        schema_mapping:
          anyOf:
            - $ref: '#/components/schemas/SchemaMapping-Output'
            - type: 'null'
          description: >-
            Schema mapping defining how source data maps to bucket schema
            fields. Maps external storage attributes (tags, metadata, columns,
            filenames) to bucket schema fields and blob properties. When
            provided, enables structured extraction of metadata from the sync
            source. See SchemaMapping for detailed configuration options.
        sync_mode:
          $ref: '#/components/schemas/SyncMode'
          description: Sync mode controlling lifecycle (initial_only or continuous).
          default: continuous
        polling_interval_seconds:
          type: integer
          maximum: 900
          minimum: 30
          title: Polling Interval Seconds
          description: Polling interval in seconds (continuous mode).
          default: 300
        batch_size:
          type: integer
          maximum: 100
          minimum: 1
          title: Batch Size
          description: Number of files processed per sync batch.
          default: 50
        create_object_on_confirm:
          type: boolean
          title: Create Object On Confirm
          description: Whether objects should be created immediately after confirmation.
          default: true
        skip_duplicates:
          type: boolean
          title: Skip Duplicates
          description: Skip files whose hashes already exist in the bucket.
          default: true
        skip_batch_submission:
          type: boolean
          title: Skip Batch Submission
          description: >-
            Sync-only mode: download and store files in the bucket without
            running them through the collection processing pipeline. Set to True
            during initial bulk ingestion, then flip to False to trigger
            processing once all files are synced.
          default: false
        reconcile:
          $ref: '#/components/schemas/ReconcileSettings'
          description: >-
            Controls how Mixpeek reconciles objects when the source changes.
            on_delete: cascade-delete when source asset is removed. on_update:
            propagate metadata changes and re-process. on_filter_drift: remove
            objects that no longer match filters. All default to True for full
            consistency.
        status:
          $ref: '#/components/schemas/TaskStatusEnum'
          description: >-
            Current lifecycle status for the sync configuration. PENDING: Not
            yet started. ACTIVE: Currently running/polling. SUSPENDED:
            Temporarily paused. COMPLETED: Initial sync completed (for
            initial_only mode). FAILED: Sync encountered errors.
          default: PENDING
        is_active:
          type: boolean
          title: Is Active
          description: Convenience flag used for filtering active syncs.
          default: true
        total_files_discovered:
          type: integer
          minimum: 0
          title: Total Files Discovered
          description: Cumulative count of files found in source across all runs.
          default: 0
        total_files_synced:
          type: integer
          minimum: 0
          title: Total Files Synced
          description: Cumulative count of successfully synced files.
          default: 0
        total_files_failed:
          type: integer
          minimum: 0
          title: Total Files Failed
          description: Cumulative count of failed files (sent to DLQ after 3 retries).
          default: 0
        total_bytes_synced:
          type: integer
          minimum: 0
          title: Total Bytes Synced
          description: Cumulative bytes transferred across all runs.
          default: 0
        created_at:
          type: string
          format: date-time
          title: Created At
          description: When sync configuration was created.
        updated_at:
          type: string
          format: date-time
          title: Updated At
          description: Last modification timestamp.
        last_sync_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Last Sync At
          description: When last successful sync completed. Used for incremental syncs.
        per_shard_last_sync_at:
          additionalProperties:
            type: string
            format: date-time
          type: object
          title: Per Shard Last Sync At
          description: >-
            Per-shard last-sync timestamps keyed by shard value (e.g.
            collection_id). When a new shard is added, its absence here forces a
            full scan even if the global last_sync_at is set.
        next_sync_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Next Sync At
          description: Scheduled time for next sync (continuous/scheduled modes).
        created_by_user_id:
          type: string
          title: Created By User Id
          description: User identifier that created the sync configuration.
        last_error:
          anyOf:
            - type: string
              maxLength: 1000
            - type: 'null'
          title: Last Error
          description: Most recent error message if sync attempts failed.
        consecutive_failures:
          type: integer
          minimum: 0
          title: Consecutive Failures
          default: 0
        provider_filters:
          additionalProperties: true
          type: object
          title: Provider Filters
          description: >-
            Provider-specific pre-filters pushed down to the API call. The sync
            engine passes these to iter_objects() without interpretation. Each
            provider defines its own schema. Applied BEFORE file_filters.
            Examples: Iconik {'collection_ids': [...]}, Google Drive
            {'shared_drive_id': '...'}
        source_type:
          anyOf:
            - type: string
            - type: 'null'
          title: Source Type
          description: >-
            Storage provider type for API progress views (for example: s3,
            google_drive, iconik).
        metadata:
          additionalProperties: true
          type: object
          title: Metadata
          description: Arbitrary metadata supplied by the user.
        locked_by_worker_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Locked By Worker Id
          description: Worker ID that currently holds the lock for this sync
        locked_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Locked At
          description: Timestamp when lock was acquired
        lock_expires_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Lock Expires At
          description: Timestamp when lock expires (for stale lock recovery)
        pending_full_sync:
          type: boolean
          title: Pending Full Sync
          description: >-
            A full sweep was requested (trigger?full_sync=true) while a run held
            the lock. The finishing run dispatches it automatically on lock
            release.
          default: false
        paused:
          type: boolean
          title: Paused
          description: Whether sync is currently paused (user-controlled)
          default: false
        pause_reason:
          anyOf:
            - type: string
            - type: 'null'
          title: Pause Reason
          description: Reason for pause
        paused_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Paused At
          description: Timestamp when paused
        paused_by_user_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Paused By User Id
          description: User who paused the sync
        max_objects_per_run:
          type: integer
          minimum: 1
          title: Max Objects Per Run
          description: Hard cap on objects per sync run (prevents runaway syncs)
          default: 100000
        max_batch_chunk_size:
          type: integer
          maximum: 1000
          minimum: 1
          title: Max Batch Chunk Size
          description: Maximum objects per batch chunk
          default: 1000
        batch_chunk_size:
          type: integer
          maximum: 1000
          minimum: 1
          title: Batch Chunk Size
          description: Number of objects per batch chunk (for concurrent processing)
          default: 100
        current_sync_run_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Current Sync Run Id
          description: UUID for current/last sync run
        sync_run_counter:
          type: integer
          minimum: 0
          title: Sync Run Counter
          description: Increments on each sync execution
          default: 0
        batch_ids:
          items:
            type: string
          type: array
          title: Batch Ids
          description: List of batch IDs created by this sync
        task_ids:
          items:
            type: string
          type: array
          title: Task Ids
          description: List of task IDs for batches
        batches_created:
          type: integer
          minimum: 0
          title: Batches Created
          description: Total number of batches created
          default: 0
        resume_enabled:
          type: boolean
          title: Resume Enabled
          description: Whether resuming partial runs is enabled
          default: true
        resume_cursor:
          anyOf:
            - type: string
            - type: 'null'
          title: Resume Cursor
          description: Last page/cursor processed (for paginated APIs like Google Drive)
        resume_last_primary_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Resume Last Primary Key
          description: Last primary key processed (for database syncs with stable ordering)
        resume_objects_processed:
          type: integer
          minimum: 0
          title: Resume Objects Processed
          description: Count of objects processed in current/last run
          default: 0
        resume_checkpoint_frequency:
          type: integer
          maximum: 10000
          minimum: 100
          title: Resume Checkpoint Frequency
          description: 'How often to checkpoint (in objects). Default: every 1000 objects'
          default: 1000
        current_cursor:
          anyOf:
            - type: string
            - type: 'null'
          title: Current Cursor
          description: >-
            Convenience mirror of the current resume cursor for API progress
            views.
        sync_checkpoints:
          additionalProperties:
            additionalProperties: true
            type: object
          type: object
          title: Sync Checkpoints
          description: >-
            Per-(config, shard) high-water checkpoints keyed by shard key (e.g.
            collection_id for parallel fan-outs, 'pages_N_M' for page-range
            shards, '__default__' for unsharded runs). Each entry holds: pass_id
            (lexicographically-ordered pass marker), cursor (provider cursor,
            e.g. JSON-encoded Iconik search_after), objects_processed
            (forward-only progress guard), modified_since (incremental filter
            frozen at pass start), completed_at (set when the shard drained its
            source — the next cycle wraps around to a fresh full pass only after
            the polling cadence elapses), and updated_at. A NEW job resumes each
            shard from its checkpoint instead of re-walking from page 1
            (2026-06-11 re-scan treadmill).
        schedule:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Schedule
          description: >-
            Derived scheduling summary: mode, interval, next run, and last
            successful run.
        sync_progress:
          additionalProperties: true
          type: object
          title: Sync Progress
          description: Derived progress summary for API observability.
        locked:
          type: boolean
          title: Locked
          description: Whether a worker currently holds this sync's run lock.
          readOnly: true
      type: object
      required:
        - bucket_id
        - connection_id
        - internal_id
        - namespace_id
        - source_path
        - created_by_user_id
        - locked
      title: SyncConfigurationModel
      description: >-
        Bucket-scoped configuration for automated storage synchronization.


        Defines how files are synced from external storage providers to a
        Mixpeek bucket.

        Includes configuration, status, metrics, and robustness control fields.


        **Supported Providers:** google_drive, s3, snowflake, sharepoint, tigris


        **Built-in Robustness:**

        - Distributed locking (locked_by_worker_id, lock_expires_at)

        - Pause/resume control (paused, pause_reason, paused_at)

        - Safety limits (max_objects_per_run, batch_chunk_size)

        - Resume checkpointing (resume_cursor, resume_objects_processed)

        - Batch tracking (batch_ids, task_ids, batches_created)


        **Metrics Fields:**

        - total_files_discovered: Files found in source

        - total_files_synced: Successfully synced files

        - total_files_failed: Files that failed (check DLQ)

        - total_bytes_synced: Total data transferred

        - consecutive_failures: Failure count for auto-suspend
    ErrorResponse:
      properties:
        success:
          type: boolean
          title: Success
          description: Always false for error responses
          default: false
        status:
          type: integer
          title: Status
          description: HTTP status code for this error
        error:
          $ref: '#/components/schemas/ErrorDetail'
          description: Error details payload
      type: object
      required:
        - status
        - error
      title: ErrorResponse
      description: Error response model.
      examples:
        - error:
            details:
              id: ns_123
              resource: namespace
            message: Namespace not found
            type: NotFoundError
          status: 404
          success: false
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    TaskStatusEnum:
      type: string
      enum:
        - PENDING
        - QUEUED
        - IN_PROGRESS
        - PROCESSING
        - COMPLETED
        - COMPLETED_WITH_ERRORS
        - FAILED
        - CANCELED
        - INTERRUPTED
        - UNKNOWN
        - SKIPPED
        - DRAFT
        - ACTIVE
        - ARCHIVED
        - SUSPENDED
      title: TaskStatusEnum
      description: |-
        Enumeration of task statuses for tracking asynchronous operations.

        Task statuses indicate the current state of asynchronous operations like
        batch processing, object ingestion, clustering, and taxonomy execution.

        Status Categories:
            Operation Statuses: Track progress of async operations
            Lifecycle Statuses: Track entity state (buckets, collections, namespaces)

        Values:
            PENDING: Task is queued but has not started processing yet
            IN_PROGRESS: Task is currently being executed
            PROCESSING: Task is actively processing data (similar to IN_PROGRESS)
            COMPLETED: Task finished successfully with no errors
            COMPLETED_WITH_ERRORS: Task finished but some items failed (partial success)
            FAILED: Task encountered an error and could not complete
            CANCELED: Task was manually canceled by a user or system
            UNKNOWN: Task status could not be determined
            SKIPPED: Task was intentionally skipped
            DRAFT: Task is in draft state and not yet submitted

            ACTIVE: Entity is active and operational (for buckets, collections, etc.)
            ARCHIVED: Entity has been archived
            SUSPENDED: Entity has been temporarily suspended

        Terminal Statuses:
            COMPLETED, COMPLETED_WITH_ERRORS, FAILED, CANCELED are terminal statuses.
            Once a task reaches these states, it will not transition to another state.

        Partial Success Handling:
            COMPLETED_WITH_ERRORS indicates that the operation completed but some
            documents/items failed. The task result includes:
            - List of successful items
            - List of failed items with error details
            - Success rate percentage
            This allows clients to handle partial success scenarios appropriately.

        Polling Guidance:
            - Poll tasks in PENDING, QUEUED, IN_PROGRESS, or PROCESSING states
            - Stop polling when task reaches COMPLETED, COMPLETED_WITH_ERRORS, FAILED, or CANCELED
            - Use exponential backoff (1s → 30s) when polling
    SchemaMapping-Input:
      properties:
        mappings:
          additionalProperties:
            oneOf:
              - $ref: '#/components/schemas/FieldMappingEntry'
              - $ref: '#/components/schemas/BlobMappingEntry'
            discriminator:
              propertyName: target_type
              mapping:
                blob:
                  $ref: '#/components/schemas/BlobMappingEntry'
                field:
                  $ref: '#/components/schemas/FieldMappingEntry'
          type: object
          title: Mappings
          description: >-
            Dictionary mapping target field names to their source extractors.
            Keys are bucket schema field names (e.g., 'content', 'category').
            Values are mapping entries defining how to extract and store the
            data. At least one blob mapping (target_type='blob') is recommended
            for file syncs.
      type: object
      required:
        - mappings
      title: SchemaMapping
      description: >-
        Complete schema mapping configuration for a sync.


        Defines how source data (files, tags, metadata, columns) maps to the

        target bucket schema. Each key is a target field/blob name in the
        bucket.


        **Key Concepts:**

        - Keys are target bucket schema field names

        - Values define the source and extraction method

        - At least one blob mapping is typically required for file syncs

        - Field mappings extract metadata alongside the file content


        **Provider Examples:**


        **S3/Tigris Video Sync:**

        ```json

        {
            "content": {
                "target_type": "blob",
                "source": {"type": "file"},
                "blob_type": "video"
            },
            "category": {
                "target_type": "field",
                "source": {"type": "tag", "key": "category"}
            },
            "source_bucket": {
                "target_type": "field",
                "source": {"type": "constant", "value": "production-videos"}
            }
        }

        ```


        **Snowflake Customer Table Sync:**

        ```json

        {
            "customer_name": {
                "target_type": "field",
                "source": {"type": "column", "name": "NAME"}
            },
            "profile_image": {
                "target_type": "blob",
                "source": {"type": "column", "name": "AVATAR_URL"},
                "blob_type": "image"
            },
            "segment": {
                "target_type": "field",
                "source": {"type": "column", "name": "CUSTOMER_SEGMENT"},
                "transform": "lowercase"
            }
        }

        ```


        **Google Drive with Folder Categories:**

        ```json

        {
            "content": {
                "target_type": "blob",
                "source": {"type": "file"},
                "blob_type": "auto"
            },
            "department": {
                "target_type": "field",
                "source": {"type": "folder_path", "segment": 0},
                "transform": "lowercase"
            },
            "description": {
                "target_type": "field",
                "source": {"type": "drive_property", "key": "description"}
            }
        }

        ```


        Attributes:
            mappings: Dictionary mapping target field names to their source extractors
    ReconcileSettings:
      properties:
        on_delete:
          type: boolean
          title: On Delete
          description: >-
            When a source asset is deleted, cascade-delete the corresponding
            Mixpeek objects (and their collection documents). Default True.
          default: true
        on_update:
          type: boolean
          title: On Update
          description: >-
            When a source asset's metadata changes, propagate the update to the
            Mixpeek object and re-process it through connected collections.
            Default True.
          default: true
        on_filter_drift:
          type: boolean
          title: On Filter Drift
          description: >-
            During reconciliation, remove objects whose source asset no longer
            matches the configured metadata_filters. Default True.
          default: true
        re_extract_on_update:
          type: boolean
          title: Re Extract On Update
          description: >-
            When on_update is True, also re-extract (rebatch) the object through
            its connected collections. Set to False to propagate metadata
            changes without triggering re-extraction. Default True.
          default: true
        re_extract_fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Re Extract Fields
          description: >-
            When set, only re-extract if one of these specific fields changed.
            Field names are matched against the source metadata keys. If None
            (default), any metadata change triggers re-extraction (when
            re_extract_on_update is True). Example: ['title', 'description',
            'media_type']
      type: object
      title: ReconcileSettings
      description: Controls how Mixpeek reconciles objects when the source changes.
    FileFilters:
      properties:
        include_patterns:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Include Patterns
          description: Glob patterns to include (e.g. ['*.mp4', '*.mov']).
        exclude_patterns:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Exclude Patterns
          description: Glob patterns to exclude (e.g. ['*/drafts/*', '*_temp.*']).
        min_size_bytes:
          anyOf:
            - type: integer
              minimum: 0
            - type: 'null'
          title: Min Size Bytes
          description: Minimum file size (bytes). Files smaller are skipped.
        max_size_bytes:
          anyOf:
            - type: integer
              minimum: 0
            - type: 'null'
          title: Max Size Bytes
          description: Maximum file size (bytes). Files larger are skipped.
        modified_after:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Modified After
          description: Only sync files modified after this timestamp.
        modified_before:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Modified Before
          description: Only sync files modified before this timestamp.
        mime_types:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Mime Types
          description: Optional list of MIME types to include.
        metadata_filters:
          anyOf:
            - items:
                $ref: '#/components/schemas/MetadataFilter'
              type: array
            - type: 'null'
          title: Metadata Filters
          description: >-
            Filters applied to provider-specific metadata fields. All filters
            combined with AND logic.
      type: object
      title: FileFilters
      description: >-
        Filter rules controlling which files are synced from storage providers.


        All filters are optional and combined with AND logic.

        Files must pass ALL specified filters to be synced.


        **Pattern Matching:** Uses glob patterns (*, ?, [abc], etc.)

        **Size Filtering:** Bytes-based, inclusive bounds

        **Time Filtering:** ISO 8601 datetime, based on provider's modified
        timestamp
    SchemaMapping-Output:
      properties:
        mappings:
          additionalProperties:
            oneOf:
              - $ref: '#/components/schemas/FieldMappingEntry'
              - $ref: '#/components/schemas/BlobMappingEntry'
            discriminator:
              propertyName: target_type
              mapping:
                blob:
                  $ref: '#/components/schemas/BlobMappingEntry'
                field:
                  $ref: '#/components/schemas/FieldMappingEntry'
          type: object
          title: Mappings
          description: >-
            Dictionary mapping target field names to their source extractors.
            Keys are bucket schema field names (e.g., 'content', 'category').
            Values are mapping entries defining how to extract and store the
            data. At least one blob mapping (target_type='blob') is recommended
            for file syncs.
      type: object
      required:
        - mappings
      title: SchemaMapping
      description: >-
        Complete schema mapping configuration for a sync.


        Defines how source data (files, tags, metadata, columns) maps to the

        target bucket schema. Each key is a target field/blob name in the
        bucket.


        **Key Concepts:**

        - Keys are target bucket schema field names

        - Values define the source and extraction method

        - At least one blob mapping is typically required for file syncs

        - Field mappings extract metadata alongside the file content


        **Provider Examples:**


        **S3/Tigris Video Sync:**

        ```json

        {
            "content": {
                "target_type": "blob",
                "source": {"type": "file"},
                "blob_type": "video"
            },
            "category": {
                "target_type": "field",
                "source": {"type": "tag", "key": "category"}
            },
            "source_bucket": {
                "target_type": "field",
                "source": {"type": "constant", "value": "production-videos"}
            }
        }

        ```


        **Snowflake Customer Table Sync:**

        ```json

        {
            "customer_name": {
                "target_type": "field",
                "source": {"type": "column", "name": "NAME"}
            },
            "profile_image": {
                "target_type": "blob",
                "source": {"type": "column", "name": "AVATAR_URL"},
                "blob_type": "image"
            },
            "segment": {
                "target_type": "field",
                "source": {"type": "column", "name": "CUSTOMER_SEGMENT"},
                "transform": "lowercase"
            }
        }

        ```


        **Google Drive with Folder Categories:**

        ```json

        {
            "content": {
                "target_type": "blob",
                "source": {"type": "file"},
                "blob_type": "auto"
            },
            "department": {
                "target_type": "field",
                "source": {"type": "folder_path", "segment": 0},
                "transform": "lowercase"
            },
            "description": {
                "target_type": "field",
                "source": {"type": "drive_property", "key": "description"}
            }
        }

        ```


        Attributes:
            mappings: Dictionary mapping target field names to their source extractors
    SyncMode:
      type: string
      enum:
        - initial_only
        - continuous
      title: SyncMode
      description: Supported sync modes for external storage ingestion.
    ErrorDetail:
      properties:
        message:
          type: string
          title: Message
          description: Human-readable error message
        type:
          type: string
          title: Type
          description: Stable error type identifier (machine-readable)
        code:
          anyOf:
            - type: string
            - type: 'null'
          title: Code
          description: >-
            Fine-grained error code for programmatic handling (e.g.,
            namespace_name_taken, feature_extractor_not_found). Present only
            when consumers may need to branch on a specific error condition.
        details:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Details
          description: >-
            Optional structured details to help debugging (validation errors,
            IDs, etc.)
      type: object
      required:
        - message
        - type
      title: ErrorDetail
      description: Error detail model.
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
    FieldMappingEntry:
      properties:
        target_type:
          type: string
          const: field
          title: Target Type
          description: Target type. Must be 'field' for regular schema fields.
          default: field
        source:
          oneOf:
            - $ref: '#/components/schemas/S3TagSource'
            - $ref: '#/components/schemas/S3MetadataSource'
            - $ref: '#/components/schemas/FilenameRegexSource'
            - $ref: '#/components/schemas/ColumnSource'
            - $ref: '#/components/schemas/DrivePropertySource'
            - $ref: '#/components/schemas/FolderPathSource'
            - $ref: '#/components/schemas/FileSource'
            - $ref: '#/components/schemas/ConstantSource'
            - $ref: '#/components/schemas/RSSFieldSource'
          title: Source
          description: >-
            Source extractor defining where to get the value. Options: tag,
            metadata, filename_regex, column, drive_property, folder_path,
            constant. The 'file' source is not valid for field mappings (use
            blob instead).
          discriminator:
            propertyName: type
            mapping:
              column:
                $ref: '#/components/schemas/ColumnSource'
              constant:
                $ref: '#/components/schemas/ConstantSource'
              drive_property:
                $ref: '#/components/schemas/DrivePropertySource'
              file:
                $ref: '#/components/schemas/FileSource'
              filename_regex:
                $ref: '#/components/schemas/FilenameRegexSource'
              folder_path:
                $ref: '#/components/schemas/FolderPathSource'
              metadata:
                $ref: '#/components/schemas/S3MetadataSource'
              rss_field:
                $ref: '#/components/schemas/RSSFieldSource'
              tag:
                $ref: '#/components/schemas/S3TagSource'
        transform:
          anyOf:
            - type: string
            - type: 'null'
          title: Transform
          description: >-
            Optional transformation to apply to the extracted value. Supported
            transforms: 'lowercase' - convert to lowercase, 'uppercase' -
            convert to uppercase, 'trim' - remove leading/trailing whitespace,
            'json_parse' - parse JSON string to object/array. Transforms are
            applied after extraction, before storage.
          examples:
            - lowercase
            - uppercase
            - trim
            - json_parse
        required:
          type: boolean
          title: Required
          description: >-
            If True, the sync will fail if this field cannot be extracted. If
            False (default), missing values result in the field being omitted.
            Use required=True for critical fields that must be present.
          default: false
      type: object
      required:
        - source
      title: FieldMappingEntry
      description: |-
        Maps a source value to a bucket schema field.

        Used for mapping metadata, tags, columns, or extracted values to
        regular fields in the bucket schema (strings, numbers, arrays, etc.).
        Does NOT handle file content - use BlobMappingEntry for that.

        Example: Map S3 tag "category" to bucket field "content_category"
            {
                "target_type": "field",
                "source": {"type": "tag", "key": "category"}
            }

        Example: Map folder name to "department" with lowercase transform
            {
                "target_type": "field",
                "source": {"type": "folder_path", "segment": 0},
                "transform": "lowercase"
            }

        Example: Map filename regex capture to "date" field
            {
                "target_type": "field",
                "source": {"type": "filename_regex", "pattern": "^(\d{4}-\d{2}-\d{2})"},
                "required": true
            }

        Attributes:
            target_type: Must be "field" for schema field mappings
            source: The source extractor defining where to get the value
            transform: Optional transformation to apply (lowercase, uppercase, trim)
            required: Whether missing values should fail the sync
    BlobMappingEntry:
      properties:
        target_type:
          type: string
          const: blob
          title: Target Type
          description: Target type. Must be 'blob' for blob mappings.
          default: blob
        source:
          oneOf:
            - $ref: '#/components/schemas/S3TagSource'
            - $ref: '#/components/schemas/S3MetadataSource'
            - $ref: '#/components/schemas/FilenameRegexSource'
            - $ref: '#/components/schemas/ColumnSource'
            - $ref: '#/components/schemas/DrivePropertySource'
            - $ref: '#/components/schemas/FolderPathSource'
            - $ref: '#/components/schemas/FileSource'
            - $ref: '#/components/schemas/ConstantSource'
            - $ref: '#/components/schemas/RSSFieldSource'
          title: Source
          description: >-
            Source extractor defining where to get the blob content or URL. Use
            'file' for the synced file itself (most common). Use 'column' for
            database URL columns pointing to external content. Use 'metadata'
            for S3 metadata containing URLs.
          discriminator:
            propertyName: type
            mapping:
              column:
                $ref: '#/components/schemas/ColumnSource'
              constant:
                $ref: '#/components/schemas/ConstantSource'
              drive_property:
                $ref: '#/components/schemas/DrivePropertySource'
              file:
                $ref: '#/components/schemas/FileSource'
              filename_regex:
                $ref: '#/components/schemas/FilenameRegexSource'
              folder_path:
                $ref: '#/components/schemas/FolderPathSource'
              metadata:
                $ref: '#/components/schemas/S3MetadataSource'
              rss_field:
                $ref: '#/components/schemas/RSSFieldSource'
              tag:
                $ref: '#/components/schemas/S3TagSource'
        blob_type:
          $ref: '#/components/schemas/BlobType'
          description: >-
            Type of blob content. Determines which extractors can process it.
            'auto' (default) infers type from mime_type - recommended for files.
            Explicit types: 'image', 'video', 'audio', 'text', 'document',
            'json', 'binary'. Use explicit types when mime_type detection is
            unreliable.
          default: auto
        blob_property:
          type: string
          minLength: 1
          title: Blob Property
          description: >-
            The blob property name in the bucket schema. This identifies which
            blob in the object's blobs array. Default: 'content' (the primary
            blob). Must match a blob property defined in the bucket schema.
          default: content
          examples:
            - content
            - thumbnail
            - profile_image
            - document
            - audio
        mime_type_override:
          anyOf:
            - type: string
            - type: 'null'
          title: Mime Type Override
          description: >-
            Optional mime_type to use instead of auto-detection. Useful when the
            source doesn't provide accurate mime_type. Format: 'type/subtype'
            (e.g., 'image/jpeg', 'video/mp4'). When set, this value is used for
            blob.details.mime_type.
          examples:
            - image/jpeg
            - video/mp4
            - application/pdf
            - audio/mpeg
      type: object
      required:
        - source
      title: BlobMappingEntry
      description: |-
        Maps a source to a blob in the bucket object.

        Used for mapping files or URL references to blob fields. The blob_type
        determines how the content is processed by extractors. This is the
        primary way to map synced files into the Mixpeek extraction pipeline.

        Example: Map the synced file to the primary "content" blob
            {
                "target_type": "blob",
                "source": {"type": "file"},
                "blob_type": "auto",
                "blob_property": "content"
            }

        Example: Map a database column URL to an image blob
            {
                "target_type": "blob",
                "source": {"type": "column", "name": "AVATAR_URL"},
                "blob_type": "image",
                "blob_property": "profile_image"
            }

        Example: Map with explicit mime_type override
            {
                "target_type": "blob",
                "source": {"type": "file"},
                "blob_type": "video",
                "blob_property": "content",
                "mime_type_override": "video/mp4"
            }

        Attributes:
            target_type: Must be "blob" for blob mappings
            source: The source extractor (usually "file" for synced content)
            blob_type: Content type hint for extractors (auto, image, video, etc.)
            blob_property: Name of the blob property in the bucket schema
            mime_type_override: Optional explicit mime_type to use
    MetadataFilter:
      properties:
        field:
          type: string
          title: Field
        operator:
          type: string
          enum:
            - equals
            - not_equals
            - contains
            - not_contains
            - gt
            - lt
            - gte
            - lte
            - exists
          title: Operator
          default: contains
        value:
          anyOf:
            - type: string
            - type: integer
            - type: number
            - type: boolean
            - type: 'null'
          title: Value
      type: object
      required:
        - field
      title: MetadataFilter
      description: Filter on a provider-specific metadata field.
    S3TagSource:
      properties:
        type:
          type: string
          const: tag
          title: Type
          description: Source type identifier. Must be 'tag' for S3/Tigris object tags.
          default: tag
        key:
          type: string
          maxLength: 128
          minLength: 1
          title: Key
          description: >-
            The tag key to extract. Case-sensitive. Must match the exact tag key
            on the S3/Tigris object. Common examples: 'category', 'project',
            'owner', 'environment'. Maximum length: 128 characters.
          examples:
            - category
            - project
            - content-type
            - cost-center
      type: object
      required:
        - key
      title: S3TagSource
      description: >-
        Extract value from an S3/Tigris object tag.


        S3 object tags are key-value pairs attached to objects, commonly used
        for

        categorization, cost allocation, and metadata. Tags are limited to 10
        per object

        with keys up to 128 chars and values up to 256 chars.


        Provider Compatibility: S3, Tigris, MinIO, DigitalOcean Spaces, Wasabi


        Example S3 CLI to set tag:
            aws s3api put-object-tagging --bucket my-bucket --key video.mp4 \
                --tagging 'TagSet=[{Key=category,Value=marketing}]'

        Example mapping:
            {"type": "tag", "key": "category"} -> extracts "marketing" from the tag

        Attributes:
            type: Must be "tag" to identify this source type
            key: The tag key to extract (case-sensitive, max 128 chars)
    S3MetadataSource:
      properties:
        type:
          type: string
          const: metadata
          title: Type
          description: >-
            Source type identifier. Must be 'metadata' for S3/Tigris user
            metadata.
          default: metadata
        key:
          type: string
          maxLength: 1024
          minLength: 1
          title: Key
          description: >-
            The metadata key to extract (without 'x-amz-meta-' prefix).
            Case-insensitive (S3 lowercases all metadata keys). Common examples:
            'author', 'version', 'source-system', 'original-filename'. Note: S3
            automatically lowercases keys, so 'Author' becomes 'author'.
          examples:
            - author
            - version
            - original-filename
            - source-system
      type: object
      required:
        - key
      title: S3MetadataSource
      description: |-
        Extract value from S3/Tigris object user metadata.

        S3 user metadata (x-amz-meta-*) provides custom key-value pairs stored
        with the object. Unlike tags, metadata is set at upload time and is
        immutable without re-uploading the object.

        Provider Compatibility: S3, Tigris, MinIO, DigitalOcean Spaces, Wasabi

        Example S3 CLI to set metadata:
            aws s3 cp video.mp4 s3://bucket/ --metadata '{"author":"john","version":"1.0"}'

        Example boto3 upload with metadata:
            s3.put_object(Bucket='b', Key='k', Body=data, Metadata={'author': 'john'})

        Example mapping:
            {"type": "metadata", "key": "author"} -> extracts "john" from x-amz-meta-author

        Attributes:
            type: Must be "metadata" to identify this source type
            key: The metadata key without 'x-amz-meta-' prefix (case-insensitive)
    FilenameRegexSource:
      properties:
        type:
          type: string
          const: filename_regex
          title: Type
          description: >-
            Source type identifier. Must be 'filename_regex' for regex
            extraction.
          default: filename_regex
        pattern:
          type: string
          minLength: 1
          title: Pattern
          description: >-
            Python regex pattern with exactly one capture group. The captured
            group becomes the extracted value. Pattern is applied to the
            filename only (not full path). Use non-capturing groups (?:...) for
            grouping without capturing. Remember to escape backslashes in JSON
            (\\d instead of \d).
          examples:
            - ^(\w+)_
            - _(\d+)_
            - \.(\w+)$
            - ^(\d{4}-\d{2}-\d{2})
        default:
          anyOf:
            - type: string
            - type: 'null'
          title: Default
          description: >-
            Default value if regex doesn't match the filename. If None and regex
            doesn't match, the field is omitted from the object. Useful for
            ensuring a field always has a value.
          examples:
            - unknown
            - default
            - unclassified
      type: object
      required:
        - pattern
      title: FilenameRegexSource
      description: >-
        Extract value from filename using a regex pattern with capture groups.


        Useful when metadata is encoded in filenames following a naming
        convention.

        The regex must contain exactly one capture group to extract the value.


        Provider Compatibility: All providers (works on any filename)


        Example filenames and patterns:
            - "marketing_Q4_2024_final.mp4" with pattern "^(\w+)_Q\d+_" -> "marketing"
            - "user_12345_avatar.jpg" with pattern "user_(\d+)_" -> "12345"
            - "2024-01-15_meeting_notes.pdf" with pattern "^(\d{4}-\d{2}-\d{2})" -> "2024-01-15"
            - "IMG_20240115_143022.jpg" with pattern "IMG_(\d{8})_" -> "20240115"

        Note: Use raw strings in Python or double-escape backslashes in JSON.


        Attributes:
            type: Must be "filename_regex" to identify this source type
            pattern: Python regex with exactly one capture group
            default: Optional default value if regex doesn't match
    ColumnSource:
      properties:
        type:
          type: string
          const: column
          title: Type
          description: Source type identifier. Must be 'column' for database columns.
          default: column
        name:
          type: string
          minLength: 1
          title: Name
          description: >-
            The column name to extract from. Case handling depends on the
            database. Snowflake: case-insensitive (defaults to uppercase).
            PostgreSQL: case-sensitive unless quoted. Column must exist in the
            source table/view.
          examples:
            - category
            - CUSTOMER_TYPE
            - metadata_json
            - image_url
      type: object
      required:
        - name
      title: ColumnSource
      description: >-
        Extract value from a database column (Snowflake, PostgreSQL, etc.).


        For database sync sources, maps a column from the source table/view

        to a bucket schema field. Column handling varies by database.


        Provider Compatibility: Snowflake, PostgreSQL (future), BigQuery
        (future)


        Example Snowflake table:
            CREATE TABLE CUSTOMERS (
                ID VARCHAR, NAME VARCHAR, CATEGORY VARCHAR,
                CREATED_AT TIMESTAMP, PROFILE_IMAGE_URL VARCHAR
            );

        Example mapping:
            {"type": "column", "name": "CATEGORY"} -> extracts the CATEGORY column value

        Database-Specific Notes:
            - Snowflake: Case-insensitive (internally uppercase), use unquoted names
            - PostgreSQL: Case-sensitive if quoted, defaults to lowercase
            - BigQuery: Case-sensitive

        Attributes:
            type: Must be "column" to identify this source type
            name: The column name to extract from
    DrivePropertySource:
      properties:
        type:
          type: string
          const: drive_property
          title: Type
          description: Source type identifier. Must be 'drive_property' for Google Drive.
          default: drive_property
        key:
          type: string
          minLength: 1
          title: Key
          description: >-
            The property key to extract. Built-in: 'name', 'mimeType',
            'description', 'starred', 'createdTime', 'modifiedTime', 'size',
            'webViewLink', 'parents'. Custom: Any key set in the file's
            appProperties. Case-sensitive.
          examples:
            - description
            - starred
            - category
            - modifiedTime
      type: object
      required:
        - key
      title: DrivePropertySource
      description: |-
        Extract value from Google Drive file properties.

        Google Drive files have built-in properties (name, mimeType, etc.) and
        custom properties (appProperties). This source extracts from either.

        Provider Compatibility: Google Drive, Google Workspace Shared Drives

        Built-in properties:
            - name: File name
            - mimeType: MIME type
            - description: File description
            - starred: Boolean star status
            - trashed: Boolean trash status
            - createdTime: Creation timestamp
            - modifiedTime: Last modified timestamp
            - size: File size in bytes

        Custom properties: Set via Drive API appProperties field

        Example mapping:
            {"type": "drive_property", "key": "description"} -> extracts file description

        Attributes:
            type: Must be "drive_property" to identify this source type
            key: The property key to extract (case-sensitive)
    FolderPathSource:
      properties:
        type:
          type: string
          const: folder_path
          title: Type
          description: Source type identifier. Must be 'folder_path' for path extraction.
          default: folder_path
        segment:
          anyOf:
            - type: integer
            - type: 'null'
          title: Segment
          description: >-
            Extract a specific path segment by index. 0 = first segment (root
            folder), 1 = second segment, etc. -1 = last segment (immediate
            parent), -2 = second to last, etc. If None and full_path is False,
            extracts the immediate parent folder.
          examples:
            - 0
            - 1
            - 2
            - -1
            - -2
        full_path:
          type: boolean
          title: Full Path
          description: >-
            If True, extracts the complete folder path (joined with '/'). If
            False, extracts only the segment specified or immediate parent. Does
            not include the filename.
          default: false
      type: object
      title: FolderPathSource
      description: |-
        Extract value from the folder path structure.

        Useful for deriving categories or metadata from folder organization.
        Can extract the full path, a specific segment, or the immediate parent.

        Provider Compatibility: All providers with folder/prefix structure

        Example folder structure: /Marketing/Campaigns/Q4-2024/videos/
            - segment=0 -> "Marketing"
            - segment=1 -> "Campaigns"
            - segment=2 -> "Q4-2024"
            - segment=-1 -> "videos" (last segment)
            - full_path=True -> "Marketing/Campaigns/Q4-2024/videos"
            - Neither (default) -> "videos" (immediate parent)

        Use Cases:
            - Derive category from top-level folder
            - Extract project name from folder structure
            - Preserve full path for hierarchical organization

        Attributes:
            type: Must be "folder_path" to identify this source type
            segment: Index of path segment to extract (0-based, negative for reverse)
            full_path: Whether to extract complete path
    FileSource:
      properties:
        type:
          type: string
          const: file
          title: Type
          description: Source type identifier. Must be 'file' for the synced file itself.
          default: file
      type: object
      title: FileSource
      description: |-
        Use the synced file itself as the source (for blob mappings).

        This is the primary source for blob-type mappings where the synced file
        content becomes the blob. The mime_type is automatically detected from
        the file unless explicitly overridden in the BlobMappingEntry.

        Provider Compatibility: All providers (works on any synced file)

        Example usage:
            {"type": "file"} -> The synced file becomes the blob content

        This source type has no additional configuration - it simply indicates
        that the synced file content should be used as the blob data.

        Attributes:
            type: Must be "file" to identify this source type
    ConstantSource:
      properties:
        type:
          type: string
          const: constant
          title: Type
          description: Source type identifier. Must be 'constant' for static values.
          default: constant
        value:
          title: Value
          description: >-
            The constant value to use for all objects. Can be any
            JSON-serializable value: string, number, boolean, array, object.
            Arrays are useful for tags, objects for structured metadata.
          examples:
            - production
            - 42
            - true
            - - tag1
              - tag2
            - env: prod
      type: object
      required:
        - value
      title: ConstantSource
      description: |-
        Use a constant/static value for all synced objects.

        Useful for adding fixed metadata to all objects from a sync, such as:
        - Source system identifier
        - Environment tags
        - Default categories
        - Static labels

        Provider Compatibility: All providers

        Example mappings:
            {"type": "constant", "value": "tigris-cdn"} -> All objects get "tigris-cdn"
            {"type": "constant", "value": ["tag1", "tag2"]} -> All objects get this array
            {"type": "constant", "value": {"env": "prod"}} -> All objects get this object

        Attributes:
            type: Must be "constant" to identify this source type
            value: The constant value (any JSON-serializable type)
    RSSFieldSource:
      properties:
        type:
          type: string
          const: rss_field
          title: Type
          description: Source type identifier.
          default: rss_field
        field:
          type: string
          title: Field
          description: 'RSS entry field: title, author, link, categories, summary, published'
      type: object
      required:
        - field
      title: RSSFieldSource
      description: |-
        Extract value from an RSS entry field.

        Provider Compatibility: RSS only

        Available fields: title, author, link, categories, summary, published

        Example mapping:
            {"type": "rss_field", "field": "title"} -> extracts entry title
            {"type": "rss_field", "field": "categories"} -> extracts list of category terms

        Attributes:
            type: Must be "rss_field" to identify this source type
            field: RSS entry field name to extract
    BlobType:
      type: string
      enum:
        - auto
        - image
        - video
        - audio
        - text
        - pdf
        - excel
      title: BlobType
      description: >-
        Type of blob content for schema mapping.


        Determines how the blob content is processed and what extractors can
        operate on it.

        This is critical for the extraction pipeline to route content correctly.


        Values:
            auto: Automatically infer blob type from mime_type (recommended for files)
            image: Image files (JPEG, PNG, WebP, BMP, TIFF, or GIF as static)
            video: Video files (MP4, MOV, WebM, AVI, MKV, or GIF as animated frames)
            audio: Audio files (MP3, WAV, FLAC, AAC, OGG)
            text: Text files (TXT, MD, HTML, XML)
            pdf: PDF documents
            excel: Spreadsheet files (XLSX, XLS, CSV)

        **GIF Special Handling**:
            GIF files are unique - they can be processed as either IMAGE or VIDEO:

            - As IMAGE: Single static embedding (first frame), no decomposition
            - As VIDEO: Frame-by-frame decomposition with per-frame embeddings

            When using "auto", GIFs default to IMAGE. To get frame-level processing
            for animated GIFs, explicitly set blob_type to VIDEO.

        Usage Guidelines:
            - Use "auto" when syncing files with accurate mime_type headers
            - Use explicit types when mime_type is missing or unreliable
            - Use "video" for animated GIFs requiring frame-level search

````