Connect an S3 bucket, sync video files automatically, and make them searchable end-to-end
This guide walks the full path from an S3 bucket of videos to a working search index: create a storage connection, point a sync at your S3 prefix, and let Mixpeek ingest and process new files automatically.
This is the automated/continuous path. To upload a single file by URL instead, see Video Understanding. For the AWS IAM policy and role setup, see AWS S3.
A connection stores your S3 credentials once and can be reused across buckets. Credentials are validated before the connection is saved (test_before_save defaults to true).
The response includes a connection_id (e.g. conn_abc123). Verify connectivity any time with:
curl -sS -X POST "$MP_API_URL/v1/organizations/connections/conn_abc123/test" \ -H "Authorization: Bearer $MP_API_KEY"
Connections are org-level and not namespace-scoped — no X-Namespace header is needed for connection calls. Buckets, syncs, and collections below are namespace-scoped, so they require X-Namespace.
A sync watches an S3 path and ingests matching files. With sync_mode: "continuous", Mixpeek polls for new files and ingests them automatically; initial_only runs a single backfill.
POST /v1/buckets/{bucket_id}/syncs/{sync_config_id}/trigger
Pause / resume
POST .../pause · POST .../resume
Job status
GET .../jobs/{sync_job_id}
Failed files (DLQ)
POST .../dlq
The sync ingests files as objects and (unless skip_batch_submission is set) submits them for processing automatically — so your multimodal_extractor collection populates without any extra step. Vectors are searchable within 10–30s of each batch completing.
Because the sync is continuous, new videos dropped into the S3 prefix are ingested and indexed automatically. To re-run clustering or enrichment on a schedule as new content lands, see Triggers.