Skip to main content
GET
/
v1
/
documents
/
{document_id}
Get a document by ID (namespace-scoped).
curl --request GET \
  --url https://api.mixpeek.com/v1/documents/{document_id} \
  --header 'Authorization: Bearer <token>'
{
  "document_id": "<string>",
  "collection_id": "<string>",
  "document_blobs": [
    {
      "field": "<string>",
      "url": "<string>",
      "role": "source",
      "type": "other",
      "filename": "segment_0.mp4",
      "size_bytes": 1048576,
      "content_type": "video/mp4",
      "checksum": "sha256:a1b2c3d4e5f6...",
      "created_at": "2023-11-07T05:31:56Z",
      "source_blob_id": "blob_abc123",
      "presigned_url": "<string>"
    }
  ],
  "_internal": {
    "collection_id": "col_articles",
    "created_at": "2025-10-31T10:00:00Z",
    "document_id": "doc_f8966ff29c",
    "internal_id": "org_abc123",
    "lineage": {
      "path": "bkt_content/col_articles",
      "root_bucket_id": "bkt_content",
      "root_object_id": "obj_article_001",
      "source_object_id": "obj_article_001",
      "source_type": "bucket"
    },
    "metadata": {
      "ingestion_status": "COMPLETED"
    },
    "modality": "text",
    "namespace_id": "ns_xyz789",
    "updated_at": "2025-10-31T10:00:00Z"
  }
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

document_id
string
required

The ID of the document to retrieve.

Query Parameters

return_presigned_urls
boolean
default:false

Generate fresh presigned download URLs for all blobs with S3 storage

return_vectors
boolean | null
default:false
return_vector_names
boolean
default:false

Include a '_vectors' field listing available vector names

expand
string | null

Comma-separated fields containing document IDs to resolve inline. Referenced documents are fetched and attached under an '_expanded' key. Supports dot-notation for nested fields (e.g., 'items.product_id'). Max 50 unique references per request.

Response

Successful Response

Response model for a single document.

This is the standard response format when fetching documents via API endpoints. Contains all document data plus optional presigned URLs for S3 blobs.

The document payload structure follows native Qdrant format: - System fields are stored in _internal (lineage, metadata, blobs, etc.) - User fields are at root level (brand_name, thumbnail_url, etc.) - Only document_id and collection_id are Mixpeek IDs at root level - No duplication between root and _internal

Query Parameters Affecting Response: - return_url=true: Adds presigned_url to each document_blobs entry - return_vectors=true: Includes embedding arrays in response

Use Cases: - Display document details in UI - Download source files or generated artifacts - Understand document provenance and processing - Access enrichment fields (flat) for filtering/display

document_id
string
required

REQUIRED. Unique identifier for the document. Format: 'doc_' prefix + alphanumeric characters. Use for: API queries, references, filtering.

Examples:

"doc_f8966ff29c18e20c6b45e053"

"doc_abc123"

collection_id
string
required

REQUIRED. ID of the collection this document belongs to. Format: 'col_' prefix + alphanumeric characters. Use for: Collection-scoped queries, filtering.

Examples:

"col_articles"

"col_video_frames"

document_blobs
BlobURLRef · object[]

Document blobs with presigned URLs when requested

_internal
InternalPayloadModel · object

System-managed internal fields. Contains all Mixpeek-managed metadata including lineage, processing info, timestamps, and blob references. User-defined fields appear at root level alongside document_id and collection_id.

Example:
{
"collection_id": "col_articles",
"created_at": "2025-10-31T10:00:00Z",
"document_id": "doc_f8966ff29c",
"internal_id": "org_abc123",
"lineage": {
"path": "bkt_content/col_articles",
"root_bucket_id": "bkt_content",
"root_object_id": "obj_article_001",
"source_object_id": "obj_article_001",
"source_type": "bucket"
},
"metadata": { "ingestion_status": "COMPLETED" },
"modality": "text",
"namespace_id": "ns_xyz789",
"updated_at": "2025-10-31T10:00:00Z"
}