A framework and family of models that generate fixed-size vector representations for sentences and paragraphs, enabling efficient semantic similarity comparison. Widely used in multimodal retrieval pipelines for encoding text queries and document chunks.
How It Works
Sentence Transformers use a siamese or triplet network architecture built on top of pretrained transformer models like BERT or RoBERTa. Input sentences pass through the transformer, and a pooling layer (mean, CLS token, or max) reduces the variable-length token embeddings into a single fixed-size sentence vector. These vectors are trained so that semantically similar sentences have high cosine similarity.
Technical Details
The models are typically fine-tuned using contrastive loss, triplet loss, or multiple negatives ranking loss on sentence pair datasets like NLI and STS benchmarks. Output dimensions usually range from 384 to 1024. The sentence-transformers Python library provides a simple API for encoding, and supports asymmetric search where queries and documents use different encoding strategies.
Best Practices
Choose a model size that balances latency and accuracy for your use case
Fine-tune on domain-specific sentence pairs for significantly improved retrieval quality
Use mean pooling over CLS token pooling for most general-purpose tasks
Normalize embeddings before computing cosine similarity for consistent scoring
Batch encode large document sets to maximize GPU throughput
Common Pitfalls
Using base BERT as a sentence encoder without the sentence-transformer fine-tuning step
Encoding very long documents without chunking, leading to truncation and information loss
Mixing models trained on different objectives when comparing embeddings
Ignoring the maximum sequence length, which causes silent truncation
Advanced Tips
Use asymmetric models where query and passage encoders are optimized separately
Implement hard negative mining during fine-tuning for sharper retrieval boundaries
Distill large sentence transformer models into smaller ones for production deployment
Combine sentence embeddings with sparse retrieval (BM25) in a hybrid pipeline for best recall