Skip to main content

The Problem With the Status Quo

Most teams building AI applications over unstructured data end up with a Frankenstack: a vector database for embeddings, an object store for raw files, a separate search engine for text, custom ETL for each modality, and bespoke inference pipelines per use case. Every new modality means a new system, a new integration, and a new failure mode. This is the exact problem that structured data solved decades ago. The journey went: flat files → databases → data warehouses. Unstructured data is making the same journey: raw storage → vector databases → multimodal data warehouses.

Four Persistent Artifacts

A multimodal data warehouse manages four persistent artifacts that vector databases don’t:
  1. Raw Objects. The original files (videos, images, documents, audio). Stored in object storage, referenced by URI, never modified after ingest.
  2. Derived Features. The semantic decomposition of each object. A video becomes scene embeddings, face identities, logo detections, speech transcripts, audio fingerprints. Each feature has its own embedding space and is independently queryable.
  3. Indexes. Hot, warm, and cold storage tiers. Actively queried features live in a fast vector engine (~10ms). Infrequently queried features live in object-storage-native vector indexes (~100ms, 90% cheaper). Archives hold everything else.
  4. Retrieval Pipelines. Multi-stage compositions of filter, sort, reduce, enrich, and apply stages. The query language for unstructured data. No vector database offers composable retrieval.

What Vector Databases Do Well

Vector databases are excellent at one thing: approximate nearest-neighbor search over pre-computed embeddings. If your use case is:
  • Single embedding space (e.g., text-only RAG)
  • Simple similarity search with metadata filters
  • Small to medium dataset (millions, not billions of vectors)
  • Always-hot storage acceptable
Then a vector database is the right tool. Use it.

Where Vector Databases Fall Short

Vector databases don’t handle:
CapabilityVector DBMultimodal Warehouse
Feature extraction from raw files✗ (bring your own embeddings)✓ (built-in extractors)
Multiple embedding spaces per object✗ (one index per space)✓ (features as first-class citizens)
Tiered storage (hot/warm/cold)✗ (everything always hot)✓ (lifecycle-managed tiers)
Multi-stage retrieval✗ (single query, rank, return)✓ (filter, sort, reduce, enrich, apply)
Cross-collection enrichment✗ (no joins)✓ (semantic joins via enrich stages)
Object lineage tracking✗ (no provenance)✓ (feature URI traces back to source)

The Warehouse Abstraction

The multimodal data warehouse is defined by three operations: Decompose. Break any file into its semantic atoms. Videos become scenes, faces, logos, speech, embeddings. Documents become tables, entities, layouts. One API call per object. Store. Place features across cost-appropriate tiers. Hot for active queries, warm for infrequent access, cold for compliance and archive. Lifecycle rules automate movement. Reassemble. Chain retrieval stages into pipelines. Filter by face similarity, narrow by logo presence, sort by sentiment, reduce to top-k, enrich with cross-collection context. The query language for unstructured data. This is what makes Mixpeek a warehouse, not a database.

Next Steps

Quickstart

Build your first warehouse pipeline in 10 minutes

Architecture

How the warehouse engine works under the hood

Multi-Stage Retrieval

The query language for unstructured data

Feature Extractors

Decompose any file into queryable features