Mixpeek Logo
    Login / Signup

    What is Object Decomposition

    Object Decomposition - The process of breaking complex unstructured files into their semantic components (features) for independent indexing and retrieval.

    Object decomposition is the first operation in a multimodal data warehouse. A single file, such as a video, document, or audio recording, is analyzed by one or more feature extractors to produce multiple independent semantic representations. Each representation (a feature) has its own embedding space and can be queried independently through retrieval pipelines.

    How It Works

    When a file is ingested into Mixpeek, it passes through a collection's configured feature extractors. A video might produce scene embeddings (CLIP), face identities (ArcFace), logo detections (SigLIP), speech transcripts (Whisper), and audio fingerprints. Each feature is stored as a separate vector with metadata linking it back to the source object, timestamp, and spatial location.

    Examples by Modality

    • Video → scenes, faces, logos, speech segments, audio fingerprints, OCR text overlays
    • Document → text chunks, tables, layout regions, entities, page embeddings
    • Image → visual embeddings, detected objects, text (OCR), faces, colors
    • Audio → speech transcription, speaker diarization, audio fingerprints, sentiment

    Best Practices

    • Choose extractors based on downstream query needs; don't extract features you won't search
    • Use the multimodal extractor for general-purpose decomposition
    • Use specialized extractors (face-identity, document) for high-precision vertical use cases
    • Monitor extraction throughput and credit consumption per modality

    Related Pages

    • Feature Extractors: /docs/processing/feature-extractors
    • Collections: /docs/ingestion/collections
    • What Is a Multimodal Data Warehouse?: /blog/multimodal-data-warehouse