A retrieval approach using sparse vectors where most dimensions are zero, typically based on term frequencies or learned sparse representations. Sparse retrieval complements dense methods in hybrid multimodal search systems.
Sparse retrieval represents documents and queries as high-dimensional vectors where each dimension corresponds to a vocabulary term. Traditional methods like BM25 use term frequency statistics, while learned sparse methods like SPLADE use neural networks to assign importance weights to terms. Retrieval uses inverted indices for efficient lookup of documents matching query terms.
Classical sparse vectors have dimensionality equal to vocabulary size (typically 30K-100K) with only a few hundred non-zero entries per document. SPLADE and other learned sparse models expand documents with related terms and learn term weights end-to-end. Sparse retrieval excels at exact matching, entity lookup, and queries with rare technical terms that dense models may not handle well.