A multi-stage retrieval pipeline is the query language for unstructured data. Like SQL composes WHERE, ORDER BY, LIMIT, and JOIN for structured data, multi-stage retrieval composes filter, sort, reduce, enrich, and apply stages for multimodal content. Each stage takes the previous stage's output as input, progressively narrowing and enriching the result set.
A pipeline is defined as an ordered list of stages. The first stage (usually a filter) searches across an embedding space to produce an initial candidate set. Subsequent stages narrow, reorder, sample, or enrich those candidates. For example: face search (847 candidates) → logo filter (23) → sentiment sort (23 reordered) → top-k reduce (5) → brand context enrich (5 enriched results).
Connect a bucket and Mixpeek runs the whole multimodal search pipeline for you: extraction, indexing, and search over your own objects. No models to wire up, nothing to host.
Start with ManagedKeep your embeddings on your own cloud and run dense, sparse, and BM25 search directly on object storage. First 1M vectors free.
Start with MVS