What is Bi-Encoder

Bi-Encoder - Dual-tower model encoding queries and documents independently

A bi-encoder (also called dual encoder or two-tower model) is a neural architecture that independently encodes queries and documents into dense vector embeddings using separate encoder networks. The relevance between a query and document is computed as the similarity (typically cosine or dot product) between their respective embeddings, enabling efficient retrieval from large corpora through precomputed document embeddings and approximate nearest neighbor search.

How It Works

A bi-encoder consists of two encoder networks (often sharing weights) that independently transform queries and documents into fixed-dimensional dense vectors. During indexing, all documents are encoded once and their embeddings are stored in a vector index. At query time, only the query is encoded online, and the nearest document embeddings are found using approximate nearest neighbor search. Because documents are encoded independently of the query, the entire corpus can be pre-indexed, making retrieval sub-linear in corpus size.

Technical Details

Bi-encoders typically use BERT, RoBERTa, or specialized models (E5, GTE, BGE) as the backbone encoder. The embedding is usually extracted from the [CLS] token or by mean-pooling all token representations. Training uses contrastive losses (InfoNCE, triplet loss) with in-batch negatives and hard negatives. The resulting embeddings are 384-1024 dimensions and are L2-normalized for cosine similarity. Vector indices (HNSW, IVF-PQ via FAISS or Qdrant) enable sub-millisecond search over millions of documents.

Best Practices

Use hard negative mining during training for significantly better retrieval quality
Normalize embeddings to unit vectors and use cosine similarity for stable comparison
Choose embedding dimensionality based on your accuracy-latency-storage tradeoff (768 is a safe default)
Fine-tune on domain-specific query-document pairs rather than relying on zero-shot performance
Pair bi-encoder retrieval with cross-encoder reranking for the best overall pipeline quality

Common Pitfalls

Expecting bi-encoder accuracy to match cross-encoders without the cross-attention mechanism
Using only random negatives during training instead of hard negatives, producing a weak model
Not sharing encoder weights between query and document towers when the vocabularies overlap significantly
Ignoring the impact of pooling strategy (CLS vs mean pooling) on embedding quality
Deploying without an ANN index, falling back to brute-force search that does not scale

Advanced Tips

Use progressive training: start with in-batch negatives, then add BM25 negatives, then add mined hard negatives
Apply Matryoshka training to produce embeddings that work at multiple dimensionalities from a single model
Implement asymmetric encoding where the query encoder is lightweight and the document encoder is heavier
Use knowledge distillation from cross-encoders to improve bi-encoder quality while keeping its efficiency
Consider multi-vector bi-encoders (one embedding per token) for ColBERT-style late interaction with bi-encoder efficiency

Related Terms

ACID API Blob Storage CLIP Embedding