A framework and family of models that generate fixed-size vector representations for sentences and paragraphs, enabling efficient semantic similarity comparison. Widely used in multimodal retrieval pipelines for encoding text queries and document chunks.
Sentence Transformers use a siamese or triplet network architecture built on top of pretrained transformer models like BERT or RoBERTa. Input sentences pass through the transformer, and a pooling layer (mean, CLS token, or max) reduces the variable-length token embeddings into a single fixed-size sentence vector. These vectors are trained so that semantically similar sentences have high cosine similarity.
The models are typically fine-tuned using contrastive loss, triplet loss, or multiple negatives ranking loss on sentence pair datasets like NLI and STS benchmarks. Output dimensions usually range from 384 to 1024. The sentence-transformers Python library provides a simple API for encoding, and supports asymmetric search where queries and documents use different encoding strategies.