A retrieval approach using sparse vectors where most dimensions are zero, typically based on term frequencies or learned sparse representations. Sparse retrieval complements dense methods in hybrid multimodal search systems.
Sparse retrieval represents documents and queries as high-dimensional vectors where each dimension corresponds to a vocabulary term. Traditional methods like BM25 use term frequency statistics, while learned sparse methods like SPLADE use neural networks to assign importance weights to terms. Retrieval uses inverted indices for efficient lookup of documents matching query terms.
Classical sparse vectors have dimensionality equal to vocabulary size (typically 30K-100K) with only a few hundred non-zero entries per document. SPLADE and other learned sparse models expand documents with related terms and learn term weights end-to-end. Sparse retrieval excels at exact matching, entity lookup, and queries with rare technical terms that dense models may not handle well.
Connect a bucket and Mixpeek runs the whole multimodal search pipeline for you: extraction, indexing, and search over your own objects. No models to wire up, nothing to host.
Start with ManagedKeep your embeddings on your own cloud and run dense, sparse, and BM25 search directly on object storage. First 1M vectors free.
Start with MVS