Skip to main content
Syncs pull files from your existing storage providers into Mixpeek buckets on a schedule. No migration required — connect your cloud storage, configure what to sync, and Mixpeek keeps your bucket up to date.

How Syncs Work

┌──────────────┐     poll      ┌──────────────┐    create     ┌──────────────┐
│   External   │◄──────────────│    Sync      │───objects────►│    Bucket     │
│   Storage    │───file list──►│   Worker     │               │              │
│  (S3, GCS,   │               │              │───submit─────►│  Collection  │
│   Mux, etc.) │               │              │   batches     │   Pipeline   │
└──────────────┘               └──────────────┘               └──────────────┘


                              Resume cursor,
                              metrics, DLQ
  1. Poll — At each interval, Mixpeek lists files from your storage provider using the configured source_path and filters.
  2. Filter — Files are matched against glob patterns, size limits, MIME types, and provider-specific metadata filters.
  3. Create objects — Matching files are registered in the target bucket. Duplicates are skipped by default via source tracking.
  4. Submit batches — Objects are grouped into batches and submitted to the bucket’s collection pipeline for processing.
  5. Checkpoint — A resume cursor is saved so the next poll picks up where the last one left off.

Sync Modes

ModeBehaviorUse Case
continuousPolls repeatedly at polling_interval_secondsOngoing ingestion — new files are picked up automatically
initial_onlyRuns once, then stopsOne-time backfill or migration

Create a Sync

1

Connect your storage

Create a storage connection with credentials for your provider.
2

Configure the sync

curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/syncs" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "connection_id": "conn_abc123",
    "source_path": "/videos/2025/",
    "sync_mode": "continuous",
    "polling_interval_seconds": 300,
    "skip_duplicates": true,
    "file_filters": {
      "include_patterns": ["*.mp4", "*.mov"],
      "max_size_bytes": 5368709120
    }
  }'
3

Trigger the first sync

curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/syncs/$SYNC_ID/trigger" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID"
After the initial run, continuous syncs poll automatically at the configured interval.

Configuration Reference

Core Settings

FieldTypeDefaultDescription
connection_idstringrequiredStorage connection to pull from
source_pathstringrequiredPath in external storage (format varies by provider)
sync_modestringcontinuouscontinuous or initial_only
polling_interval_secondsint300Poll frequency (30–900 seconds)
batch_sizeint50Files per batch (1–100)
skip_duplicatesbooltrueSkip files already in the bucket
reconcile_on_syncboolfalseDelete previously-synced objects that no longer match filters

File Filters

Narrow which files get synced. All filters combine with AND logic.
{
  "file_filters": {
    "include_patterns": ["*.mp4", "*.mov", "*.webm"],
    "exclude_patterns": ["*/drafts/*", "*_temp.*"],
    "min_size_bytes": 1024,
    "max_size_bytes": 5368709120,
    "modified_after": "2025-01-01T00:00:00Z",
    "mime_types": ["video/mp4", "video/quicktime"]
  }
}

Metadata Filters

Filter on provider-specific metadata fields. Useful for syncing only assets that match certain tags, statuses, or custom fields in your storage system.
{
  "file_filters": {
    "metadata_filters": [
      { "field": "status", "operator": "equals", "value": "approved" },
      { "field": "tags", "operator": "contains", "value": "hero" }
    ]
  }
}
Supported operators: equals, not_equals, contains, not_contains, gt, lt, gte, lte, exists.

Schema Mapping

Map provider metadata to bucket schema fields during sync, so structured data arrives alongside your files.
{
  "schema_mapping": {
    "field_mappings": {
      "product_name": { "source": "metadata.title" },
      "category": { "source": "metadata.tags[0]" }
    }
  }
}

Lifecycle Management

Pause and Resume

Temporarily stop a sync without losing progress:
curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/syncs/$SYNC_ID/pause" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID"

Manual Trigger

Force a sync to run immediately, outside the polling schedule:
curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/syncs/$SYNC_ID/trigger" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID"

Monitoring

Check sync status and metrics:
curl "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/syncs/$SYNC_ID" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID"
Response includes:
MetricDescription
total_files_discoveredCumulative files found in source
total_files_syncedSuccessfully synced files
total_files_failedFiles that failed after retries (sent to DLQ)
total_bytes_syncedTotal data transferred
last_sync_atWhen the last sync completed
next_sync_atWhen the next poll is scheduled
consecutive_failuresSequential failure count (auto-suspends after threshold)

Robustness

Syncs are designed for unattended, long-running operation:
  • Distributed locking prevents concurrent runs of the same sync
  • Resume cursors checkpoint progress so interrupted syncs pick up where they left off
  • Dead letter queue retries failed files up to 3 times before marking them as failed
  • Auto-suspend pauses syncs after consecutive failures to prevent runaway errors
  • Idempotent ingestion uses source tracking to never duplicate objects on retries
  • Reconciliation optionally deletes objects that no longer match your filters (enable with reconcile_on_sync)
If a sync gets stuck (e.g., a worker crashed mid-run), use the force unlock endpoint to release the distributed lock.
Sync API reference →