A search approach that understands the intent and contextual meaning of queries rather than relying on exact keyword matching. Semantic search powers intelligent multimodal retrieval where users describe what they need in natural language.
Semantic search encodes both queries and documents into dense vector representations using embedding models, then finds documents whose vectors are most similar to the query vector. This captures synonyms, paraphrases, and conceptual relationships that keyword search misses. The process involves encoding the query, performing approximate nearest neighbor search in a vector index, and ranking results by semantic similarity.
Semantic search uses bi-encoder models (E5, BGE, GTE) to independently embed queries and documents into a shared vector space. Embedding dimensions range from 384 to 1024. Vector indices (HNSW, IVF) enable sub-millisecond search over millions of documents. Cross-encoder rerankers can rescore top results for improved precision. Hybrid approaches combine semantic similarity with keyword matching (BM25) for best overall performance.
Connect a bucket and Mixpeek runs the whole multimodal search pipeline for you: extraction, indexing, and search over your own objects. No models to wire up, nothing to host.
Start with ManagedKeep your embeddings on your own cloud and run dense, sparse, and BM25 search directly on object storage. First 1M vectors free.
Start with MVS