NEWVectors or files. Pick a path.Start →

    What is Object Decomposition

    Object Decomposition - The process of breaking complex unstructured files into their semantic components (features) for independent indexing and retrieval.

    Object decomposition is the first operation in a multimodal data warehouse. A single file, such as a video, document, or audio recording, is analyzed by one or more feature extractors to produce multiple independent semantic representations. Each representation (a feature) has its own embedding space and can be queried independently through retrieval pipelines.

    How It Works

    When a file is ingested into Mixpeek, it passes through a collection's configured feature extractors. A video might produce scene embeddings (CLIP), face identities (ArcFace), logo detections (SigLIP), speech transcripts (Whisper), and audio fingerprints. Each feature is stored as a separate vector with metadata linking it back to the source object, timestamp, and spatial location.

    Examples by Modality

    • Video → scenes, faces, logos, speech segments, audio fingerprints, OCR text overlays
    • Document → text chunks, tables, layout regions, entities, page embeddings
    • Image → visual embeddings, detected objects, text (OCR), faces, colors
    • Audio → speech transcription, speaker diarization, audio fingerprints, sentiment

    Best Practices

    • Choose extractors based on downstream query needs; don't extract features you won't search
    • Use the multimodal extractor for general-purpose decomposition
    • Use specialized extractors (face-identity, document) for high-precision vertical use cases
    • Monitor extraction throughput and credit consumption per modality

    Related Pages

    • Feature Extractors: /docs/processing/feature-extractors
    • Collections: /docs/ingestion/collections
    • What Is a Multimodal Data Warehouse?: /blog/multimodal-data-warehouse
    Managed Mixpeek

    Put multimodal search to work

    Connect a bucket and Mixpeek runs the whole multimodal search pipeline for you: extraction, indexing, and search over your own objects. No models to wire up, nothing to host.

    Start with Managed
    MVS · bring your own

    Already have vectors?

    Keep your embeddings on your own cloud and run dense, sparse, and BM25 search directly on object storage. First 1M vectors free.

    Start with MVS