Ingest Data - Mixpeek

Set Up a Namespace

Every project starts with a namespace — the isolation boundary for all your resources. Use one per environment (dev, staging, prod) or per tenant.

curl -X POST "https://api.mixpeek.com/v1/namespaces" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "namespace_name": "production" }'

That’s the whole managed create — namespace_name is the only required field. You don’t register extractors up front: creating a collection with features: [...] auto-provisions the pipelines the namespace needs (see Features). The old feature_extractors field is deprecated.

Standalone (bring your own vectors)

To upsert your own vectors instead of Mixpeek-managed extraction, declare the index shapes with vector_configs:

curl -X POST "https://api.mixpeek.com/v1/namespaces" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespace_name": "byo-vectors",
    "vector_configs": [
      { "name": "clip", "dimension": 768, "metric": "cosine" }
    ]
  }'

vector_configs is a list of { name, dimension, metric } objects (not a single object). name and dimension are required; metric defaults to cosine (also euclidean, dot_product). Passing vector_configs with no features/feature_extractors infers standalone mode — no mode field needed. Omit vector_configs entirely and indexes auto-create on first upsert.

Every subsequent request needs two headers: Authorization: Bearer mxp_sk_... and X-Namespace: ns_.... Namespace API →

Create a Bucket

Buckets are schema-validated containers for raw files. Define what blob types you accept (text, image, audio, video, json, binary).

curl -X POST "https://api.mixpeek.com/v1/buckets" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket_name": "product-catalog",
    "bucket_schema": {
      "properties": {
        "product_text": { "type": "text", "required": true },
        "hero_image": { "type": "image" }
      }
    }
  }'

Bucket API →

Storage class

Pass an optional storage_class on create/update to pick a cost tier for a bucket’s objects. It’s provider-agnostic — mapped to your object store:

`storage_class`	GCS	S3 / MinIO	Best for
`standard` (default)	STANDARD	STANDARD	Hot, frequently-read buckets
`nearline`	NEARLINE	STANDARD_IA	Warm / occasional access
`coldline`	COLDLINE	GLACIER_IR	Cold / rare access
`archive`	ARCHIVE	GLACIER	Long-term retention

Applied on write for sync-based ingestion; broader rollout in progress. For buckets fed by a storage sync (S3, GCS, Drive, RSS, and other sources — the primary media path), the tier is set on each object at write time. Tiering for direct uploads (POST /objects) and presigned client uploads, plus retroactive re-tiering of existing objects, are a separate backend follow-up (in progress). Keep hot, retriever-source buckets on standard; reserve cheaper tiers for large write-once/read-occasionally media.

Connect External Storage

Sync files directly from your existing cloud storage instead of uploading manually. Mixpeek reads from your provider — no migration needed. This is a two-step flow: create a reusable connection (holds the credentials, lives at the organization level), then attach a sync to a bucket that references it. Step 1 — Create the connection (once per provider account):

curl -X POST "https://api.mixpeek.com/v1/organizations/connections" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Production S3",
    "provider_type": "s3",
    "provider_config": {
      "provider_type": "s3",
      "bucket_name": "my-source-bucket",
      "region": "us-east-1",
      "credentials": {
        "access_key_id": "AKIA...",
        "secret_access_key": "..."
      }
    }
  }'

The response includes a connection_id (conn_...). Credentials are encrypted at rest and reusable across buckets. Step 2 — Attach a sync to your bucket (flat body — no wrapper objects):

curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/syncs" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "connection_id": "conn_abc123",
    "source_path": "/videos/",
    "sync_mode": "continuous",
    "polling_interval_seconds": 3600
  }'

Then trigger the first sync:

curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/syncs/$SYNC_ID/trigger" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID"

After the initial sync, new files are picked up automatically at the configured polling interval. Use continuous mode (vs initial_only) to keep picking up new and changed files — only new or modified files since the last sync are processed, so existing files aren’t reprocessed.

Provider	Auth Method	S3-Compatible
AWS S3	IAM User / Role	Native
Google Cloud Storage	Service Account Key	No
Azure Blob Storage	Access Key / Managed Identity	No
Cloudflare R2	R2 API Token	Yes
Backblaze B2	Application Key	Yes
Wasabi	Access Key	Yes
Tigris	Access Key	Yes
Box	OAuth	No
Mux	API Token	No
Supabase	Service Key	Yes

See Object Storage providers for provider-specific setup guides. Sync API →

Register Objects

Objects are raw multimodal assets within a bucket. Two paths: URL references — point to files in your existing storage:

curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/objects" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "key_prefix": "/products",
    "blobs": [
      { "property": "hero_image", "type": "image", "data": "https://example.com/photo.jpg" },
      { "property": "product_text", "type": "text", "data": "Wireless headphones" }
    ]
  }'

Direct uploads — upload to Mixpeek-managed storage via presigned URLs:

curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/uploads" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{ "filename": "photo.jpg", "content_type": "image/jpeg" }'

Then PUT the file to the returned presigned_url and confirm with POST /uploads/{id}/confirm. For bulk imports, use batch uploads or connect your object storage via sync configurations. Object API → · Upload API →

Process with Batches

Batches group objects for extraction. Create a batch, then submit it:

# Create batch
curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/batches" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{ "object_ids": ["obj_abc", "obj_def"] }'

# Submit for processing
curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/batches/$BATCH_ID/submit" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID"

Batch Lifecycle

DRAFT → QUEUED → PROCESSING → COMPLETED
                     ↘        ↘
                    FAILED    COMPLETED_WITH_ERRORS

Poll GET /v1/buckets/{id}/batches/{id} until the status is terminal — COMPLETED, COMPLETED_WITH_ERRORS, FAILED, or CANCELED (a poller that waits only for COMPLETED hangs on partial success) — or use webhooks to get notified on batch.completed. Batch API →

​Set Up a Namespace

​Standalone (bring your own vectors)

​Create a Bucket

​Storage class

​Connect External Storage

​Register Objects

​Process with Batches

​Batch Lifecycle

Set Up a Namespace

Standalone (bring your own vectors)

Create a Bucket

Storage class

Connect External Storage

Register Objects

Process with Batches

Batch Lifecycle