Mixpeek Logo

    What is Bi-Encoder

    Bi-Encoder - Dual-tower model encoding queries and documents independently

    A bi-encoder (also called dual encoder or two-tower model) is a neural architecture that independently encodes queries and documents into dense vector embeddings using separate encoder networks. The relevance between a query and document is computed as the similarity (typically cosine or dot product) between their respective embeddings, enabling efficient retrieval from large corpora through precomputed document embeddings and approximate nearest neighbor search.

    How It Works

    A bi-encoder consists of two encoder networks (often sharing weights) that independently transform queries and documents into fixed-dimensional dense vectors. During indexing, all documents are encoded once and their embeddings are stored in a vector index. At query time, only the query is encoded online, and the nearest document embeddings are found using approximate nearest neighbor search. Because documents are encoded independently of the query, the entire corpus can be pre-indexed, making retrieval sub-linear in corpus size.

    Technical Details

    Bi-encoders typically use BERT, RoBERTa, or specialized models (E5, GTE, BGE) as the backbone encoder. The embedding is usually extracted from the [CLS] token or by mean-pooling all token representations. Training uses contrastive losses (InfoNCE, triplet loss) with in-batch negatives and hard negatives. The resulting embeddings are 384-1024 dimensions and are L2-normalized for cosine similarity. Vector indices (HNSW, IVF-PQ via FAISS or Qdrant) enable sub-millisecond search over millions of documents.

    Best Practices

    • Use hard negative mining during training for significantly better retrieval quality
    • Normalize embeddings to unit vectors and use cosine similarity for stable comparison
    • Choose embedding dimensionality based on your accuracy-latency-storage tradeoff (768 is a safe default)
    • Fine-tune on domain-specific query-document pairs rather than relying on zero-shot performance
    • Pair bi-encoder retrieval with cross-encoder reranking for the best overall pipeline quality

    Common Pitfalls

    • Expecting bi-encoder accuracy to match cross-encoders without the cross-attention mechanism
    • Using only random negatives during training instead of hard negatives, producing a weak model
    • Not sharing encoder weights between query and document towers when the vocabularies overlap significantly
    • Ignoring the impact of pooling strategy (CLS vs mean pooling) on embedding quality
    • Deploying without an ANN index, falling back to brute-force search that does not scale

    Advanced Tips

    • Use progressive training: start with in-batch negatives, then add BM25 negatives, then add mined hard negatives
    • Apply Matryoshka training to produce embeddings that work at multiple dimensionalities from a single model
    • Implement asymmetric encoding where the query encoder is lightweight and the document encoder is heavier
    • Use knowledge distillation from cross-encoders to improve bi-encoder quality while keeping its efficiency
    • Consider multi-vector bi-encoders (one embedding per token) for ColBERT-style late interaction with bi-encoder efficiency