A bi-encoder (also called dual encoder or two-tower model) is a neural architecture that independently encodes queries and documents into dense vector embeddings using separate encoder networks. The relevance between a query and document is computed as the similarity (typically cosine or dot product) between their respective embeddings, enabling efficient retrieval from large corpora through precomputed document embeddings and approximate nearest neighbor search.
A bi-encoder consists of two encoder networks (often sharing weights) that independently transform queries and documents into fixed-dimensional dense vectors. During indexing, all documents are encoded once and their embeddings are stored in a vector index. At query time, only the query is encoded online, and the nearest document embeddings are found using approximate nearest neighbor search. Because documents are encoded independently of the query, the entire corpus can be pre-indexed, making retrieval sub-linear in corpus size.
Bi-encoders typically use BERT, RoBERTa, or specialized models (E5, GTE, BGE) as the backbone encoder. The embedding is usually extracted from the [CLS] token or by mean-pooling all token representations. Training uses contrastive losses (InfoNCE, triplet loss) with in-batch negatives and hard negatives. The resulting embeddings are 384-1024 dimensions and are L2-normalized for cosine similarity. Vector indices (HNSW, IVF-PQ via FAISS or Qdrant) enable sub-millisecond search over millions of documents.