Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Decomposition: raw objects are split into documents, each with extracted features like embeddings, transcripts, and metadata
Decomposition is the core transformation in Mixpeek. A raw file (video, PDF, image, audio) goes in as one object. It comes out as many documents, each with its own features. This is what makes sub-file search possible — you search within a video at the segment level, not for the video as a whole.

Three Primitives

PrimitiveWhat It IsRole
ObjectRaw file or record in a bucket (video, PDF, JSON row, image).The input boundary. You upload objects.
DocumentOne row of output in a collection, produced by decomposition.The query boundary. You search documents.
FeatureA named output attached to a document (embedding, transcript, OCR text, label, score).The composition boundary. Retrievers reference features by URI.
The pipeline is always:
Object (bucket) → Decomposition → Document (collection) → Features (MVS + MongoDB)

What Decomposition Decides

The feature extractor controls how an object is decomposed into documents. The strategy depends on the content type:
Content TypeDecomposition StrategyResult
VideoTime intervals, scene boundaries, or silence gapsEach segment = 1 document with visual embedding + transcript + scene description
AudioSilence boundaries or fixed intervalsEach segment = 1 document with transcript + transcript embedding
PDF / DocumentPage, paragraph, or sentence boundariesEach chunk = 1 document with text content + text embedding
ImageNo split (1:1)1 image = 1 document with visual embedding + OCR + description
Structured dataRow-level (1:1)1 row = 1 document with field-level features

Why It Matters

Without decomposition, a 30-minute video is one record. Searching for “the moment the CEO mentions revenue” means scanning the entire video. There’s no way to return a specific timestamp. With decomposition, that video becomes ~180 ten-second segments, each with its own transcript embedding, visual embedding, and scene description. A search returns the exact segment at 14:30 where the CEO says “revenue grew 22%.” The same applies to documents: a 200-page PDF becomes 200 searchable chunks instead of one monolithic record.

Feature URIs

Every feature produced by decomposition gets a URI that uniquely identifies it:
mixpeek://multimodal_extractor@v1/multimodal_embedding
mixpeek://face_identity_extractor@v1/face_embedding
mixpeek://my_custom_plugin@1.0.0/domain_embedding
Retrievers, taxonomies, and clusters reference features by URI. This is the composition boundary — you can build a retriever that searches multimodal_embedding in one stage and face_embedding in another, even though they were produced by different extractors.

Configuring Decomposition

Decomposition is configured via the feature_extractor field on a collection:
{
  "collection_name": "video-library",
  "source": { "type": "bucket", "bucket_id": "bkt_videos" },
  "feature_extractor": {
    "feature_extractor_name": "multimodal_extractor",
    "version": "v1",
    "input_mappings": {
      "video": "payload.video_url"
    },
    "settings": {
      "video_segmentation": {
        "type": "time",
        "interval_sec": 10
      },
      "run_transcription": true,
      "run_scene_description": true
    }
  }
}
The settings object controls the decomposition strategy. Each extractor has its own settings — see the extractor-specific pages for details:

From Video

Time, scene, and silence segmentation strategies

From Images

Visual embeddings, OCR, and structured extraction

From Audio

Silence-boundary segmentation and transcription

From Documents

Page, paragraph, and sentence chunking

Multi-Tier Decomposition

When a single extraction pass isn’t enough — e.g., you need to transcribe audio then embed the transcription then classify each chunk — you chain collections. Each tier reads the output of the previous one, forming a DAG:
Tier 1: raw video → segments with transcripts
Tier 2: tier-1 docs → text chunks with embeddings
Tier 3: tier-2 docs → classifications per chunk
The engine resolves tiers automatically and executes them in dependency order. See Multi-Tier Feature Extraction for the full guide.

Lineage

Every document tracks its lineage back to the original object:
{
  "root_object_id": "obj_video_123",
  "root_bucket_id": "bkt_marketing",
  "source_collection_id": "col_segments",
  "lineage_path": "bkt_marketing/col_segments/col_chunks"
}
This lets you trace any search result back through tiers to the original file. Use the Lineage API to visualize the decomposition tree.