Syncs

Syncs pull files from your existing storage providers into Mixpeek buckets on a schedule. No migration required — connect your cloud storage, configure what to sync, and Mixpeek keeps your bucket up to date.

How Syncs Work

┌──────────────┐     poll      ┌──────────────┐    create     ┌──────────────┐
│   External   │◄──────────────│    Sync      │───objects────►│    Bucket     │
│   Storage    │───file list──►│   Worker     │               │              │
│  (S3, GCS,   │               │              │───submit─────►│  Collection  │
│   Mux, etc.) │               │              │   batches     │   Pipeline   │
└──────────────┘               └──────────────┘               └──────────────┘
                                     │
                                     ▼
                              Resume cursor,
                              metrics, DLQ

Poll — At each interval, Mixpeek lists files from your storage provider using the configured source_path and filters.
Filter — Files are matched against glob patterns, size limits, MIME types, and provider-specific metadata filters.
Create objects — Matching files are registered in the target bucket. Duplicates are skipped by default via source tracking.
Submit batches — Objects are grouped into batches and submitted to the bucket’s collection pipeline for processing.
Checkpoint — A resume cursor is saved so the next poll picks up where the last one left off.

Sync Modes

Mode	Behavior	Use Case
`continuous`	Polls repeatedly at `polling_interval_seconds`	Ongoing ingestion — new files are picked up automatically
`initial_only`	Runs once, then stops	One-time backfill or migration

Create a Sync

Connect your storage

Create a storage connection with credentials for your provider.

Configure the sync

curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/syncs" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "connection_id": "conn_abc123",
    "source_path": "/videos/2025/",
    "sync_mode": "continuous",
    "polling_interval_seconds": 300,
    "skip_duplicates": true,
    "file_filters": {
      "include_patterns": ["*.mp4", "*.mov"],
      "max_size_bytes": 5368709120
    }
  }'

Trigger the first sync

curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/syncs/$SYNC_ID/trigger" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID"

After the initial run, continuous syncs poll automatically at the configured interval.

Configuration Reference

Core Settings

Field	Type	Default	Description
`connection_id`	string	required	Storage connection to pull from
`source_path`	string	required	Path in external storage (format varies by provider)
`sync_mode`	string	`continuous`	`continuous` or `initial_only`
`polling_interval_seconds`	int	`300`	Poll frequency (30–900 seconds)
`batch_size`	int	`50`	Files per batch (1–100)
`skip_duplicates`	bool	`true`	Skip files already in the bucket
`reconcile`	object	—	Reconcile on source change: `on_delete` (cascade-delete objects when the source asset is removed, default `true`), `on_update` (propagate metadata changes + re-extract, default `true`), `on_filter_drift` (drop objects that no longer match filters, default `true`)

File Filters

Narrow which files get synced. All filters combine with AND logic.

{
  "file_filters": {
    "include_patterns": ["*.mp4", "*.mov", "*.webm"],
    "exclude_patterns": ["*/drafts/*", "*_temp.*"],
    "min_size_bytes": 1024,
    "max_size_bytes": 5368709120,
    "modified_after": "2025-01-01T00:00:00Z",
    "mime_types": ["video/mp4", "video/quicktime"]
  }
}

Metadata Filters

Filter on provider-specific metadata fields. Useful for syncing only assets that match certain tags, statuses, or custom fields in your storage system.

{
  "file_filters": {
    "metadata_filters": [
      { "field": "status", "operator": "equals", "value": "approved" },
      { "field": "tags", "operator": "contains", "value": "hero" }
    ]
  }
}

Supported operators: equals, not_equals, contains, not_contains, gt, lt, gte, lte, exists.

Schema Mapping

Map provider metadata to bucket schema fields during sync, so structured data arrives alongside your files.

{
  "schema_mapping": {
    "mappings": {
      "product_name": { "target_type": "field", "source": { "type": "metadata", "key": "title" } },
      "category": { "target_type": "field", "source": { "type": "tag", "key": "category" } }
    }
  }
}

Lifecycle Management

Pause and Resume

Temporarily stop a sync without losing progress:

curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/syncs/$SYNC_ID/pause" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID"

curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/syncs/$SYNC_ID/resume" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID"

Manual Trigger

Force a sync to run immediately, outside the polling schedule:

curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/syncs/$SYNC_ID/trigger" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID"

Monitoring

Check sync status and metrics:

curl "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/syncs/$SYNC_ID" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID"

Response includes:

Metric	Description
`total_files_discovered`	Cumulative files found in source
`total_files_synced`	Successfully synced files
`total_files_failed`	Files that failed after retries (sent to DLQ)
`total_bytes_synced`	Total data transferred
`last_sync_at`	When the last sync completed
`next_sync_at`	When the next poll is scheduled
`consecutive_failures`	Sequential failure count (auto-suspends after threshold)

Robustness

Syncs are designed for unattended, long-running operation:

Distributed locking prevents concurrent runs of the same sync
Resume cursors checkpoint progress so interrupted syncs pick up where they left off
Dead letter queue retries failed files up to 3 times before marking them as failed
Auto-suspend pauses syncs after consecutive failures to prevent runaway errors
Idempotent ingestion uses source tracking to never duplicate objects on retries
Reconciliation (the reconcile object) cascades source deletes (on_delete), propagates metadata updates (on_update), and drops objects that no longer match your filters (on_filter_drift) — all default true

If a sync gets stuck (e.g., a worker crashed mid-run), use the force unlock endpoint to release the distributed lock.

Sync API reference →

Get started

Connect your data

Extract features

Build retrievers

Enrich & organize

Integrate & operate

Resources

How Syncs Work

Sync Modes

Create a Sync

Configuration Reference

Core Settings

File Filters

Metadata Filters

Schema Mapping

Lifecycle Management

Pause and Resume

Manual Trigger

Monitoring

Robustness

​How Syncs Work

​Sync Modes

​Create a Sync

​Configuration Reference

​Core Settings

​File Filters

​Metadata Filters

​Schema Mapping

​Lifecycle Management

​Pause and Resume

​Manual Trigger

​Monitoring

​Robustness

How Syncs Work

Sync Modes

Create a Sync

Configuration Reference

Core Settings

File Filters

Metadata Filters

Schema Mapping

Lifecycle Management

Pause and Resume

Manual Trigger

Monitoring

Robustness