Skip to main content
POST
/
v1
/
collections
/
{collection_identifier}
/
export
Export Collection
curl --request POST \
  --url https://api.mixpeek.com/v1/collections/{collection_identifier}/export \
  --header 'Content-Type: application/json' \
  --data '
{
  "format": "parquet",
  "include_vectors": false,
  "select_fields": [
    "document_id",
    "metadata.title",
    "metadata.category"
  ],
  "filters": {
    "AND": [
      {
        "field": "name",
        "operator": "eq",
        "value": "John"
      },
      {
        "field": "age",
        "operator": "gte",
        "value": 30
      }
    ],
    "OR": [
      {
        "field": "status",
        "operator": "eq",
        "value": "active"
      },
      {
        "field": "role",
        "operator": "eq",
        "value": "admin"
      }
    ],
    "NOT": [
      {
        "field": "department",
        "operator": "eq",
        "value": "HR"
      },
      {
        "field": "location",
        "operator": "eq",
        "value": "remote"
      }
    ],
    "case_sensitive": true
  },
  "sample_size": 500000
}
'
{
  "download_url": "<string>",
  "s3_path": "<string>",
  "format": "json",
  "document_count": 1,
  "file_size_bytes": 1,
  "exported_at": "2023-11-07T05:31:56Z",
  "vectors_download_url": "<string>",
  "vectors_s3_path": "<string>"
}

Documentation Index

Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Headers

Authorization
string

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

Examples:

"Bearer YOUR_API_KEY"

"Bearer YOUR_STRIPE_API_KEY"

authorization
string
X-Namespace
string

Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'. Falls back to ?namespace= query parameter if the header is omitted.

Examples:

"ns_abc123def456"

"production"

"my-namespace"

Path Parameters

collection_identifier
string
required

The ID or name of the collection to export

Body

application/json

Request model for exporting collection data.

Export Formats:

  • JSON: Line-delimited JSON (JSONL) format, one document per line. Good for streaming and large files.
  • CSV: Comma-separated values. Best for tabular data analysis in spreadsheets.
  • PARQUET: Columnar format optimized for analytics. Best for large datasets and data pipelines.

Vector Export: Vectors are stored separately from document metadata due to their large size. When include_vectors=True, vectors are exported to a separate file with the naming convention: {collection_name}_vectors.{format}

Field Selection: Use select_fields to export only specific fields, reducing file size for large collections. Supports dot notation for nested fields (e.g., "metadata.title").

Filtering: Apply filters to export a subset of documents. Uses the same LogicalOperator format as the documents list endpoint.

format
enum<string>
default:parquet

Export format: json (line-delimited), csv, or parquet (default).

Available options:
json,
csv,
parquet
include_vectors
boolean
default:false

Whether to include vectors in the export. Vectors are exported to a separate file due to their large size. This significantly increases export time and file size.

select_fields
string[] | null

Specific fields to include in the export. If not provided, all fields are exported. Supports dot notation for nested fields (e.g., 'metadata.title', 'metadata.author').

Example:
[
  "document_id",
  "metadata.title",
  "metadata.category"
]
filters
LogicalOperator · object

Filter conditions to export only matching documents. Uses LogicalOperator format (AND/OR/NOT) same as document listing.

sample_size
integer | null

Maximum number of documents to export. If not provided, exports all documents. Useful for testing exports or creating sample datasets.

Required range: 1 <= x <= 1000000

Response

Successful Response

Response model for collection export.

Contains the presigned URL for downloading the exported file. The URL is valid for a limited time (typically 1 hour).

download_url
string
required

Presigned URL for downloading the exported file. Valid for 1 hour.

s3_path
string
required

Full S3 path where the export is stored (for internal reference).

format
enum<string>
required

The format of the exported file.

Available options:
json,
csv,
parquet
document_count
integer
required

Number of documents included in the export.

Required range: x >= 0
file_size_bytes
integer
required

Size of the exported file in bytes.

Required range: x >= 0
exported_at
string<date-time>
required

Timestamp when the export was completed.

vectors_download_url
string | null

Presigned URL for downloading the vectors file (if include_vectors=True). Vectors are exported separately due to their large size.

vectors_s3_path
string | null

Full S3 path for the vectors file (if include_vectors=True).