Dense Retrieval - Retrieval using learned dense vector representations
A retrieval paradigm that encodes queries and documents into dense embedding vectors and uses vector similarity for ranking. Dense retrieval powers semantic search in multimodal systems where keyword matching falls short.
How It Works
Dense retrieval uses dual encoder models to independently map queries and documents into a shared embedding space. At query time, the query is encoded into a vector, and the most similar document vectors are retrieved using approximate nearest neighbor search. This captures semantic meaning rather than relying on exact keyword overlap.
Technical Details
Models like DPR, E5, and BGE use transformer-based dual encoders trained with contrastive learning on query-document pairs. Document embeddings are pre-computed and indexed in vector databases. Query latency is dominated by the ANN search step since query encoding is a single forward pass. Typical embedding dimensions range from 384 to 1024.
Best Practices
Fine-tune the retrieval model on in-domain query-document pairs for best performance
Use hard negative mining during training to improve discrimination between similar documents
Combine dense retrieval with sparse retrieval (BM25) in a hybrid approach for optimal recall
Pre-compute and cache document embeddings to minimize latency at query time
Common Pitfalls
Relying solely on dense retrieval for queries that require exact keyword matching
Not normalizing embeddings before cosine similarity computation
Using a model trained on general data for a highly specialized domain without adaptation
Underestimating the storage cost of dense embeddings at scale
Advanced Tips
Implement late interaction models like ColBERT for token-level matching with efficient retrieval
Use knowledge distillation from cross-encoder rerankers to improve bi-encoder quality
Apply in-batch negatives during training to increase effective negative sample size
Leverage multi-vector representations for complex queries that span multiple concepts