> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Ingest Data

> Get your files into Mixpeek — namespaces, buckets, objects, uploads, and batching

## Set Up a Namespace

Every project starts with a namespace — the isolation boundary for all your resources. Use one per environment (dev, staging, prod) or per tenant.

```bash theme={null}
curl -X POST "https://api.mixpeek.com/v1/namespaces" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespace_name": "production",
    "feature_extractors": [
      { "feature_extractor_name": "multimodal_extractor", "version": "v1" }
    ]
  }'
```

Every subsequent request needs two headers: `Authorization: Bearer sk_live_...` and `X-Namespace: ns_...`.

[Namespace API →](/api-reference/namespaces/create-namespace)

## Create a Bucket

Buckets are schema-validated containers for raw files. Define what blob types you accept (text, image, audio, video, json, binary).

```bash theme={null}
curl -X POST "https://api.mixpeek.com/v1/buckets" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket_name": "product-catalog",
    "bucket_schema": {
      "properties": {
        "product_text": { "type": "text", "required": true },
        "hero_image": { "type": "image" }
      }
    }
  }'
```

[Bucket API →](/api-reference/buckets/create-bucket)

### Storage class

Pass an optional `storage_class` on create/update to pick a cost tier for a bucket's objects. It's provider-agnostic — mapped to your object store:

| `storage_class`      | GCS      | S3 / MinIO   | Best for                     |
| -------------------- | -------- | ------------ | ---------------------------- |
| `standard` (default) | STANDARD | STANDARD     | Hot, frequently-read buckets |
| `nearline`           | NEARLINE | STANDARD\_IA | Warm / occasional access     |
| `coldline`           | COLDLINE | GLACIER\_IR  | Cold / rare access           |
| `archive`            | ARCHIVE  | GLACIER      | Long-term retention          |

<Note>
  **Applied on write for sync-based ingestion; broader rollout in progress.** For buckets fed by a storage **sync** (S3, GCS, Drive, RSS, and other sources — the primary media path), the tier is set on each object at write time. Tiering for **direct uploads** (`POST /objects`) and **presigned client uploads**, plus retroactive re-tiering of **existing** objects, are a separate backend follow-up (in progress). Keep hot, retriever-source buckets on `standard`; reserve cheaper tiers for large write-once/read-occasionally media.
</Note>

## Connect External Storage

Sync files directly from your existing cloud storage instead of uploading manually. Mixpeek reads from your provider — no migration needed. This is a **two-step** flow: create a reusable **connection** (holds the credentials, lives at the organization level), then attach a **sync** to a bucket that references it.

**Step 1 — Create the connection** (once per provider account):

```bash theme={null}
curl -X POST "https://api.mixpeek.com/v1/organizations/connections" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Production S3",
    "provider_type": "s3",
    "provider_config": {
      "provider_type": "s3",
      "bucket_name": "my-source-bucket",
      "region": "us-east-1",
      "credentials": {
        "access_key_id": "AKIA...",
        "secret_access_key": "..."
      }
    }
  }'
```

The response includes a `connection_id` (`conn_...`). Credentials are encrypted at rest and reusable across buckets.

**Step 2 — Attach a sync** to your bucket (flat body — no wrapper objects):

```bash theme={null}
curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/syncs" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "connection_id": "conn_abc123",
    "source_path": "/videos/",
    "sync_mode": "continuous",
    "polling_interval_seconds": 3600
  }'
```

Then trigger the first sync:

```bash theme={null}
curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/syncs/$SYNC_ID/trigger" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID"
```

After the initial sync, new files are picked up automatically at the configured polling interval. Use `continuous` mode (vs `initial_only`) to keep picking up new and changed files — only new or modified files since the last sync are processed, so existing files aren't reprocessed.

| Provider             | Auth Method                   | S3-Compatible |
| -------------------- | ----------------------------- | ------------- |
| AWS S3               | IAM User / Role               | Native        |
| Google Cloud Storage | Service Account Key           | No            |
| Azure Blob Storage   | Access Key / Managed Identity | No            |
| Cloudflare R2        | R2 API Token                  | Yes           |
| Backblaze B2         | Application Key               | Yes           |
| Wasabi               | Access Key                    | Yes           |
| Tigris               | Access Key                    | Yes           |
| Box                  | OAuth                         | No            |
| Mux                  | API Token                     | No            |
| Supabase             | Service Key                   | Yes           |

See [Object Storage providers](/integrations/object-storage/overview) for provider-specific setup guides.

[Sync API →](/api-reference/bucket-syncs/create-sync-configuration)

## Register Objects

Objects are raw multimodal assets within a bucket. Two paths:

**URL references** — point to files in your existing storage:

```bash theme={null}
curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/objects" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "key_prefix": "/products",
    "blobs": [
      { "property": "hero_image", "type": "image", "data": "https://example.com/photo.jpg" },
      { "property": "product_text", "type": "text", "data": "Wireless headphones" }
    ]
  }'
```

**Direct uploads** — upload to Mixpeek-managed storage via presigned URLs:

```bash theme={null}
curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/uploads" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{ "filename": "photo.jpg", "content_type": "image/jpeg" }'
```

Then PUT the file to the returned `presigned_url` and confirm with `POST /uploads/{id}/confirm`.

For bulk imports, use [batch uploads](/api-reference/bucket-uploads/batch-create-uploads) or connect your object storage via [sync configurations](/integrations/object-storage/overview).

[Object API →](/api-reference/bucket-objects/create-object) · [Upload API →](/api-reference/bucket-uploads/create-upload)

## Process with Batches

Batches group objects for extraction. Create a batch, then submit it:

```bash theme={null}
# Create batch
curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/batches" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{ "object_ids": ["obj_abc", "obj_def"] }'

# Submit for processing
curl -X POST "https://api.mixpeek.com/v1/buckets/$BUCKET_ID/batches/$BATCH_ID/submit" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID"
```

### Batch Lifecycle

```
DRAFT → QUEUED → PROCESSING → COMPLETED
                     ↘        ↘
                    FAILED    COMPLETED_WITH_ERRORS
```

Poll `GET /v1/buckets/{id}/batches/{id}` until the status is terminal — `COMPLETED`, `COMPLETED_WITH_ERRORS`, `FAILED`, or `CANCELED` (a poller that waits only for `COMPLETED` hangs on partial success) — or use [webhooks](/platform/operations#webhooks) to get notified on `batch.completed`.

[Batch API →](/api-reference/bucket-batches/create-batch)