ColBERT (Contextualized Late Interaction over BERT) is a neural retrieval model that computes fine-grained similarity between queries and documents using per-token embeddings with a late interaction mechanism. It achieves the effectiveness of cross-encoders while maintaining the efficiency of bi-encoders through its MaxSim operation over token-level representations.
ColBERT independently encodes the query and document into sets of contextualized token embeddings using BERT-based encoders. At matching time, each query token embedding is compared against all document token embeddings, and the maximum similarity (MaxSim) for each query token is computed. The final relevance score is the sum of these per-token maximum similarities. This late interaction mechanism captures fine-grained token-level matching while allowing document embeddings to be precomputed and indexed.
ColBERT uses a shared BERT backbone with separate linear projection layers for query and document tokens, typically projecting to 128 dimensions. Document token embeddings are precomputed offline and stored (optionally with compression). At query time, only query tokens are encoded online. The MaxSim operation computes cosine similarity between each query token and all document tokens, taking the maximum per query token. ColBERTv2 introduced residual compression that reduces storage by 6-10x with minimal quality loss. The model is trained with pairwise or listwise ranking losses on query-document pairs.
Connect a bucket and Mixpeek runs the whole multimodal search pipeline for you: extraction, indexing, and search over your own objects. No models to wire up, nothing to host.
Start with ManagedKeep your embeddings on your own cloud and run dense, sparse, and BM25 search directly on object storage. First 1M vectors free.
Start with MVS