Search by Meaning

Vector Search: Find Content by Meaning, Not Keywords

Vector search is the hot-tier query layer of the multimodal data warehouse. It retrieves by semantic similarity using HNSW indexes, and multi-stage pipelines compose additional filter, rerank, and enrich stages on top of raw vector similarity for production retrieval.

What is Vector Search

Vector search converts content into embeddings and retrieves results by semantic similarity using approximate nearest neighbor algorithms.

How Embeddings Work

Embedding models convert text, images, audio, and video into dense numerical vectors -- fixed-length arrays of floats that encode semantic meaning. Similar content lands close together in this high-dimensional space, enabling retrieval by meaning rather than exact keyword overlap. Models like CLIP, SigLIP, and sentence-transformers each produce embeddings optimized for different modalities and tasks.

Approximate Nearest Neighbor (ANN)

Brute-force comparison of every vector against every query is too slow at scale. ANN algorithms like HNSW (Hierarchical Navigable Small World) build graph-based indexes that find the closest vectors in sub-linear time. HNSW delivers 95-99% recall at millisecond latency, making it the standard for production vector search across Qdrant, Pinecone, and Weaviate.

Why It Matters

Traditional keyword search fails when users phrase queries differently than the indexed content. Vector search bridges this gap -- a query for 'how to fix a flat tire' retrieves documents about 'puncture repair' and 'tire replacement' because the embeddings capture shared meaning. This semantic understanding is essential for natural language queries, cross-lingual retrieval, and multimodal search across text, images, and video.

How Vector Search Works

A four-step pipeline: embed content, index vectors in HNSW, embed the query, and retrieve nearest neighbors with metadata filtering.

Embed Content

Content is processed through embedding models (sentence-transformers, CLIP, SigLIP, or custom models) to generate dense vector representations. Each document, image, video frame, or audio segment becomes a fixed-length float array that encodes its semantic meaning.

Index Vectors

Vectors are indexed in Qdrant using HNSW (Hierarchical Navigable Small World) graph indexes. HNSW builds a multi-layer navigable graph that enables sub-linear approximate nearest neighbor search with configurable recall-latency tradeoffs via ef and m parameters.

Query

The search query is embedded using the same model that indexed the content. This produces a query vector in the same embedding space, ensuring that semantic similarity between the query and stored content can be measured directly via distance metrics.

Retrieve

ANN search finds the closest vectors to the query vector using cosine similarity, Euclidean distance, or dot product. Results are filtered by metadata conditions (date ranges, categories, permissions) and returned ranked by similarity score.

Vector Search vs Keyword Search

Vector search and keyword search solve different problems. Understanding when to use each -- or both -- is key to production retrieval.

Dimension	Vector Search	Keyword Search
Semantic Understanding	Understands meaning and intent behind queries	Matches exact terms only, no semantic understanding
Exact Match	Approximate -- may miss exact IDs, codes, proper nouns	Precise exact term matching (BM25, TF-IDF)
Typo Tolerance	Naturally tolerant -- embeddings encode meaning, not spelling	Requires fuzzy matching configuration, fragile with typos
Multilingual	Cross-lingual retrieval with multilingual embedding models	Requires per-language indexes, analyzers, and stemming rules
Setup Complexity	Requires embedding model selection, vector index tuning	Simpler setup -- tokenization, stemming, inverted index
Best For	Natural language queries, exploratory search, multimodal	Exact lookups, structured queries, known-item search

Need both? Hybrid search combines vector and keyword retrieval for the best of both worlds.

Vector Search Capabilities

Multi-modal embeddings, metadata filtering, configurable distance metrics, and GPU-accelerated indexing for production vector search.

Multi-Modal Embeddings

Generate and search embeddings across text, images, video frames, and audio in a single namespace. Mixpeek's 50+ feature extractors run on Ray GPU clusters, producing embeddings for any content type without external model hosting.

Text embeddings (sentence-transformers, E5, BGE)
Image and video embeddings (CLIP, SigLIP)
Audio embeddings (CLAP, Whisper transcription)

Metadata Filtering

Narrow vector search results with pre-retrieval and post-retrieval metadata filters. Filter by any field stored alongside your vectors -- dates, categories, permissions, custom attributes -- without affecting similarity scoring.

Boolean filter expressions ($and, $or, $not)
Range filters for numeric and date fields
Array contains and nested field filtering

Configurable Distance Metrics

Choose the distance metric that matches your embedding model and use case. Cosine similarity for normalized embeddings, Euclidean distance for absolute positioning, or dot product for maximum inner product search.

Cosine similarity (default for most models)
Euclidean (L2) distance
Dot product (max inner product search)

GPU-Accelerated Indexing

Embedding generation runs on Ray GPU clusters with automatic batching and scaling. Index millions of documents, images, and video frames without provisioning infrastructure. HNSW indexes are built and optimized in Qdrant for low-latency serving.

Auto-scaling Ray clusters for embedding generation
Batch processing with configurable concurrency
Incremental index updates without full rebuilds

Vector Search Architecture

A four-layer architecture spanning embedding generation, HNSW indexing, multi-stage retrieval, and hot/cold storage tiering.

Embedding Layer

Content is processed through embedding models running on Ray GPU clusters. Mixpeek supports CLIP, SigLIP, sentence-transformers, E5, BGE, and custom models. Each extractor is configured per collection, and multiple extractors can run on the same content to produce embeddings for different modalities or granularities.

Index Layer

Vectors are stored and indexed in Qdrant using HNSW graph indexes. Each namespace maps to a Qdrant collection with configurable vector dimensions, distance metrics, and HNSW parameters (m, ef_construct). Payload indexes accelerate metadata filtering alongside vector search.

Query Layer

Queries are processed through the same embedding model used during indexing, then executed as ANN search against the Qdrant HNSW index. Mixpeek's multi-stage retriever pipelines chain vector search with metadata filtering, reranking, and aggregation stages for production-grade retrieval.

Storage Layer

Collections support hot/cold tiering. Active data lives in Qdrant for millisecond-latency vector search. Cold data is offloaded to S3 Vectors for cost-efficient storage with on-demand rehydration. Metadata and lineage are always available in MongoDB regardless of storage tier.

Build Vector Search in Python

Create a collection, ingest content, and run vector search queries with metadata filtering and reranking -- all from the Python SDK.

vector_search.py

from mixpeek import Mixpeek

client = Mixpeek(api_key="YOUR_API_KEY")

# Create a collection with an embedding extractor
collection = client.collections.create(
    name="knowledge-base",
    namespace="product-docs",
    extractors=[
        {
            "type": "text_embedding",
            "model": "sentence-transformers/all-MiniLM-L6-v2",
            "config": {
                "chunk_size": 512,
                "chunk_overlap": 50
            }
        }
    ]
)

# Ingest documents into a bucket (triggers embedding pipeline)
client.buckets.upload(
    bucket_name="docs-bucket",
    files=["getting-started.pdf", "api-reference.md"],
    collection="knowledge-base"
)

# Vector search -- find semantically similar content
results = client.retrievers.execute(
    namespace="product-docs",
    stages=[
        {
            "type": "feature_search",
            "method": "vector",
            "query": {
                "text": "how to authenticate API requests",
                "modalities": ["text"]
            },
            "limit": 20
        },
        {
            "type": "filter",
            "conditions": {
                "metadata.doc_type": {"$in": ["guide", "reference"]},
                "metadata.version": {"$gte": "2.0"}
            }
        },
        {
            "type": "rerank",
            "model": "cross-encoder/ms-marco-MiniLM-L-12-v2",
            "limit": 5
        }
    ]
)

# Results ranked by semantic similarity
for result in results:
    print(f"Score: {result.score:.4f}")
    print(f"Source: {result.metadata['filename']}")
    print(f"Content: {result.content[:200]}")

Explore More

Dive deeper into related search and retrieval topics.

Hybrid Search

Combine vector search with BM25 keyword matching for the best of both worlds.

Semantic Search

Understand user intent and retrieve content by meaning, not just keywords.

Embeddings

Learn how embedding models convert content into searchable vector representations.

Multimodal Search

Search across text, images, video, and audio in a single query.

What is a Vector Database?

Glossary entry covering vector database architecture, indexing, and use cases.

Vector Search vs Full-Text Search

Detailed comparison of vector search and traditional full-text search approaches.

What is vector search?

Vector search is a retrieval method that finds content by semantic similarity rather than keyword matching. Content (text, images, audio, video) is converted into numerical vectors called embeddings using machine learning models. At query time, the query is embedded using the same model, and approximate nearest neighbor (ANN) algorithms find the closest vectors in the index. This means a search for 'automobile maintenance' retrieves documents about 'car repair' because they share semantic meaning in the embedding space.

How does vector search differ from keyword search?

Keyword search (BM25, TF-IDF) matches exact terms in documents using inverted indexes. It excels at precise lookups but fails when users phrase queries differently than the indexed content. Vector search captures semantic meaning -- it understands that 'fix a flat tire' and 'puncture repair tutorial' are related even though they share no keywords. The tradeoff is that vector search may miss exact identifiers (product SKUs, error codes) that keyword search catches precisely. Many production systems use hybrid search to combine both.

What embedding models does Mixpeek support for vector search?

Mixpeek supports 50+ embedding models through its Ray GPU feature extraction pipeline. For text: sentence-transformers (all-MiniLM-L6-v2, all-mpnet-base-v2), E5, BGE, and custom models. For images and video: CLIP (ViT-L/14, ViT-B/32) and SigLIP. For audio: CLAP and Whisper-based transcription embeddings. You configure the embedding model per collection, and Mixpeek handles GPU inference, batching, and scaling automatically.

What is HNSW and how does it speed up vector search?

HNSW (Hierarchical Navigable Small World) is an approximate nearest neighbor algorithm that builds a multi-layer graph index over your vectors. Instead of comparing the query against every stored vector (brute force), HNSW traverses the graph from coarse to fine layers, narrowing down candidates at each level. This achieves sub-linear search time -- typically O(log n) -- with 95-99% recall at millisecond latency. Qdrant uses HNSW as its primary index, with tunable parameters (m for graph connectivity, ef for search quality).

How do vector search benchmarks compare across databases?

Vector search benchmarks (ann-benchmarks, VectorDBBench) measure recall, queries per second (QPS), and indexing speed. Key factors affecting performance include: HNSW parameters (m, ef_construct), vector dimensionality, dataset size, and whether metadata filtering is applied during search. In practice, the choice of embedding model has a larger impact on search quality than the vector database engine. Mixpeek uses Qdrant, which consistently ranks among the top performers for filtered vector search on standard benchmarks.

Can I use vector search with Python?

Yes. Mixpeek provides a Python SDK that handles the full vector search pipeline: create collections with embedding extractors, ingest content that automatically generates vectors, and execute vector search queries with metadata filtering and reranking. The SDK abstracts away embedding model hosting, vector index management, and infrastructure scaling. Install with `pip install mixpeek` and start with a free API key.

What is the difference between vector search and semantic search?

Vector search is the underlying retrieval mechanism -- it finds the nearest vectors to a query vector using ANN algorithms. Semantic search is the broader concept of searching by meaning, which typically uses vector search as its core engine but may also include query expansion, reranking, and intent classification. In practice, the terms are often used interchangeably, but vector search specifically refers to the embedding + ANN retrieval step.

How does Mixpeek handle vector search at scale?

Mixpeek scales vector search across three dimensions. Embedding generation runs on auto-scaling Ray GPU clusters with batch processing for high-throughput indexing. Vector storage and retrieval uses Qdrant with HNSW indexes, supporting billions of vectors with millisecond latency. Storage tiering moves cold data to S3 Vectors while keeping hot data in Qdrant, optimizing cost without sacrificing query performance for active datasets. The multi-stage retriever pipeline chains vector search with filtering and reranking for production-grade precision.

Start Building with Vector Search

Ship semantic search in minutes. Managed embedding generation, HNSW indexing, metadata filtering, and multi-stage retriever pipelines -- no infrastructure to provision.