Export Collection

curl --request POST \ --url https://api.mixpeek.com/v1/collections/{collection_identifier}/export \ --header 'Content-Type: application/json' \ --data ' { "format": "parquet", "include_vectors": false, "select_fields": [ "document_id", "metadata.title", "metadata.category" ], "filters": { "AND": [ { "field": "name", "operator": "eq", "value": "John" }, { "field": "age", "operator": "gte", "value": 30 } ], "OR": [ { "field": "status", "operator": "eq", "value": "active" }, { "field": "role", "operator": "eq", "value": "admin" } ], "NOT": [ { "field": "department", "operator": "eq", "value": "HR" }, { "field": "location", "operator": "eq", "value": "remote" } ], "case_sensitive": true }, "sample_size": 500000 } '

{ "download_url": "<string>", "s3_path": "<string>", "document_count": 1, "file_size_bytes": 1, "exported_at": "2023-11-07T05:31:56Z", "vectors_download_url": "<string>", "vectors_s3_path": "<string>" }

Headers

Authorization

string

Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

Example:

"Bearer YOUR_MIXPEEK_API_KEY"

authorization

string

X-Namespace

string

Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'. Falls back to ?namespace= query parameter if the header is omitted.

Examples:

"ns_abc123def456"

"production"

"my-namespace"

Path Parameters

collection_identifier

string

required

The ID or name of the collection to export

Body

application/json

Request model for exporting collection data.

Export Formats:

JSON: Line-delimited JSON (JSONL) format, one document per line. Good for streaming and large files.
CSV: Comma-separated values. Best for tabular data analysis in spreadsheets.
PARQUET: Columnar format optimized for analytics. Best for large datasets and data pipelines.

Vector Export: Vectors are stored separately from document metadata due to their large size. When include_vectors=True, vectors are exported to a separate file with the naming convention: {collection_name}_vectors.{format}

Field Selection: Use select_fields to export only specific fields, reducing file size for large collections. Supports dot notation for nested fields (e.g., "metadata.title").

Filtering: Apply filters to export a subset of documents. Uses the same LogicalOperator format as the documents list endpoint.

format

enum<string>

default:parquet

Export format: json (line-delimited), csv, or parquet (default).

Available options:

json,

csv,

parquet

include_vectors

boolean

default:false

Whether to include vectors in the export. Vectors are exported to a separate file due to their large size. This significantly increases export time and file size.

select_fields

string[] | null

Specific fields to include in the export. If not provided, all fields are exported. Supports dot notation for nested fields (e.g., 'metadata.title', 'metadata.author').

Example:

[
  "document_id",
  "metadata.title",
  "metadata.category"
]

filters

LogicalOperator · object

Filter conditions to export only matching documents. Uses LogicalOperator format (AND/OR/NOT) same as document listing.

Show child attributes

sample_size

integer | null

Maximum number of documents to export. If not provided, exports all documents. Useful for testing exports or creating sample datasets.

Required range: 1 <= x <= 1000000

Response

Successful Response

Response model for collection export.

Contains the presigned URL for downloading the exported file. The URL is valid for a limited time (typically 1 hour).

download_url

string

required

Presigned URL for downloading the exported file. Valid for 1 hour.

s3_path

string

required

Full S3 path where the export is stored (for internal reference).

format

enum<string>

required

The format of the exported file.

Available options:

json,

csv,

parquet

document_count

integer

required

Number of documents included in the export.

Required range: x >= 0

file_size_bytes

integer

required

Size of the exported file in bytes.

Required range: x >= 0

exported_at

string<date-time>

required

Timestamp when the export was completed.

vectors_download_url

string | null

Presigned URL for downloading the vectors file (if include_vectors=True). Vectors are exported separately due to their large size.

vectors_s3_path

string | null

Full S3 path for the vectors file (if include_vectors=True).

Documentation Index

Headers

Path Parameters

Body

Response