Object Decomposition - The process of breaking complex unstructured files into their semantic components (features) for independent indexing and retrieval.
Object decomposition is the first operation in a multimodal data warehouse. A single file, such as a video, document, or audio recording, is analyzed by one or more feature extractors to produce multiple independent semantic representations. Each representation (a feature) has its own embedding space and can be queried independently through retrieval pipelines.
How It Works
When a file is ingested into Mixpeek, it passes through a collection's configured feature extractors. A video might produce scene embeddings (CLIP), face identities (ArcFace), logo detections (SigLIP), speech transcripts (Whisper), and audio fingerprints. Each feature is stored as a separate vector with metadata linking it back to the source object, timestamp, and spatial location.
Examples by Modality
Video → scenes, faces, logos, speech segments, audio fingerprints, OCR text overlays
Document → text chunks, tables, layout regions, entities, page embeddings
Image → visual embeddings, detected objects, text (OCR), faces, colors