A framework and family of models that generate fixed-size vector representations for sentences and paragraphs, enabling efficient semantic similarity comparison. Widely used in multimodal retrieval pipelines for encoding text queries and document chunks.
Sentence Transformers use a siamese or triplet network architecture built on top of pretrained transformer models like BERT or RoBERTa. Input sentences pass through the transformer, and a pooling layer (mean, CLS token, or max) reduces the variable-length token embeddings into a single fixed-size sentence vector. These vectors are trained so that semantically similar sentences have high cosine similarity.
The models are typically fine-tuned using contrastive loss, triplet loss, or multiple negatives ranking loss on sentence pair datasets like NLI and STS benchmarks. Output dimensions usually range from 384 to 1024. The sentence-transformers Python library provides a simple API for encoding, and supports asymmetric search where queries and documents use different encoding strategies.
Connect a bucket and Mixpeek runs the whole multimodal search pipeline for you: extraction, indexing, and search over your own objects. No models to wire up, nothing to host.
Start with ManagedKeep your embeddings on your own cloud and run dense, sparse, and BM25 search directly on object storage. First 1M vectors free.
Start with MVS