Mixpeek Logo
    Login / Signup
    Search by Meaning

    Vector Search: Find Content by Meaning, Not Keywords

    Vector search is the hot-tier query layer of the multimodal data warehouse. It retrieves by semantic similarity using HNSW indexes, and multi-stage pipelines compose additional filter, rerank, and enrich stages on top of raw vector similarity for production retrieval.

    What is Vector Search

    Vector search converts content into embeddings and retrieves results by semantic similarity using approximate nearest neighbor algorithms.

    How Embeddings Work

    Embedding models convert text, images, audio, and video into dense numerical vectors -- fixed-length arrays of floats that encode semantic meaning. Similar content lands close together in this high-dimensional space, enabling retrieval by meaning rather than exact keyword overlap. Models like CLIP, SigLIP, and sentence-transformers each produce embeddings optimized for different modalities and tasks.

    Approximate Nearest Neighbor (ANN)

    Brute-force comparison of every vector against every query is too slow at scale. ANN algorithms like HNSW (Hierarchical Navigable Small World) build graph-based indexes that find the closest vectors in sub-linear time. HNSW delivers 95-99% recall at millisecond latency, making it the standard for production vector search across Qdrant, Pinecone, and Weaviate.

    Why It Matters

    Traditional keyword search fails when users phrase queries differently than the indexed content. Vector search bridges this gap -- a query for 'how to fix a flat tire' retrieves documents about 'puncture repair' and 'tire replacement' because the embeddings capture shared meaning. This semantic understanding is essential for natural language queries, cross-lingual retrieval, and multimodal search across text, images, and video.

    How Vector Search Works

    A four-step pipeline: embed content, index vectors in HNSW, embed the query, and retrieve nearest neighbors with metadata filtering.

    1

    Embed Content

    Content is processed through embedding models (sentence-transformers, CLIP, SigLIP, or custom models) to generate dense vector representations. Each document, image, video frame, or audio segment becomes a fixed-length float array that encodes its semantic meaning.

    2

    Index Vectors

    Vectors are indexed in Qdrant using HNSW (Hierarchical Navigable Small World) graph indexes. HNSW builds a multi-layer navigable graph that enables sub-linear approximate nearest neighbor search with configurable recall-latency tradeoffs via ef and m parameters.

    3

    Query

    The search query is embedded using the same model that indexed the content. This produces a query vector in the same embedding space, ensuring that semantic similarity between the query and stored content can be measured directly via distance metrics.

    4

    Retrieve

    ANN search finds the closest vectors to the query vector using cosine similarity, Euclidean distance, or dot product. Results are filtered by metadata conditions (date ranges, categories, permissions) and returned ranked by similarity score.

    Vector Search vs Keyword Search

    Vector search and keyword search solve different problems. Understanding when to use each -- or both -- is key to production retrieval.

    DimensionVector SearchKeyword Search
    Semantic UnderstandingUnderstands meaning and intent behind queriesMatches exact terms only, no semantic understanding
    Exact MatchApproximate -- may miss exact IDs, codes, proper nounsPrecise exact term matching (BM25, TF-IDF)
    Typo ToleranceNaturally tolerant -- embeddings encode meaning, not spellingRequires fuzzy matching configuration, fragile with typos
    MultilingualCross-lingual retrieval with multilingual embedding modelsRequires per-language indexes, analyzers, and stemming rules
    Setup ComplexityRequires embedding model selection, vector index tuningSimpler setup -- tokenization, stemming, inverted index
    Best ForNatural language queries, exploratory search, multimodalExact lookups, structured queries, known-item search

    Need both? Hybrid search combines vector and keyword retrieval for the best of both worlds.

    Vector Search Capabilities

    Multi-modal embeddings, metadata filtering, configurable distance metrics, and GPU-accelerated indexing for production vector search.

    Multi-Modal Embeddings

    Generate and search embeddings across text, images, video frames, and audio in a single namespace. Mixpeek's 50+ feature extractors run on Ray GPU clusters, producing embeddings for any content type without external model hosting.

    • Text embeddings (sentence-transformers, E5, BGE)
    • Image and video embeddings (CLIP, SigLIP)
    • Audio embeddings (CLAP, Whisper transcription)

    Metadata Filtering

    Narrow vector search results with pre-retrieval and post-retrieval metadata filters. Filter by any field stored alongside your vectors -- dates, categories, permissions, custom attributes -- without affecting similarity scoring.

    • Boolean filter expressions ($and, $or, $not)
    • Range filters for numeric and date fields
    • Array contains and nested field filtering

    Configurable Distance Metrics

    Choose the distance metric that matches your embedding model and use case. Cosine similarity for normalized embeddings, Euclidean distance for absolute positioning, or dot product for maximum inner product search.

    • Cosine similarity (default for most models)
    • Euclidean (L2) distance
    • Dot product (max inner product search)

    GPU-Accelerated Indexing

    Embedding generation runs on Ray GPU clusters with automatic batching and scaling. Index millions of documents, images, and video frames without provisioning infrastructure. HNSW indexes are built and optimized in Qdrant for low-latency serving.

    • Auto-scaling Ray clusters for embedding generation
    • Batch processing with configurable concurrency
    • Incremental index updates without full rebuilds

    Vector Search Architecture

    A four-layer architecture spanning embedding generation, HNSW indexing, multi-stage retrieval, and hot/cold storage tiering.

    1

    Embedding Layer

    Content is processed through embedding models running on Ray GPU clusters. Mixpeek supports CLIP, SigLIP, sentence-transformers, E5, BGE, and custom models. Each extractor is configured per collection, and multiple extractors can run on the same content to produce embeddings for different modalities or granularities.

    2

    Index Layer

    Vectors are stored and indexed in Qdrant using HNSW graph indexes. Each namespace maps to a Qdrant collection with configurable vector dimensions, distance metrics, and HNSW parameters (m, ef_construct). Payload indexes accelerate metadata filtering alongside vector search.

    3

    Query Layer

    Queries are processed through the same embedding model used during indexing, then executed as ANN search against the Qdrant HNSW index. Mixpeek's multi-stage retriever pipelines chain vector search with metadata filtering, reranking, and aggregation stages for production-grade retrieval.

    4

    Storage Layer

    Collections support hot/cold tiering. Active data lives in Qdrant for millisecond-latency vector search. Cold data is offloaded to S3 Vectors for cost-efficient storage with on-demand rehydration. Metadata and lineage are always available in MongoDB regardless of storage tier.

    Build Vector Search in Python

    Create a collection, ingest content, and run vector search queries with metadata filtering and reranking -- all from the Python SDK.

    vector_search.py
    from mixpeek import Mixpeek
    
    client = Mixpeek(api_key="YOUR_API_KEY")
    
    # Create a collection with an embedding extractor
    collection = client.collections.create(
        name="knowledge-base",
        namespace="product-docs",
        extractors=[
            {
                "type": "text_embedding",
                "model": "sentence-transformers/all-MiniLM-L6-v2",
                "config": {
                    "chunk_size": 512,
                    "chunk_overlap": 50
                }
            }
        ]
    )
    
    # Ingest documents into a bucket (triggers embedding pipeline)
    client.buckets.upload(
        bucket_name="docs-bucket",
        files=["getting-started.pdf", "api-reference.md"],
        collection="knowledge-base"
    )
    
    # Vector search -- find semantically similar content
    results = client.retrievers.execute(
        namespace="product-docs",
        stages=[
            {
                "type": "feature_search",
                "method": "vector",
                "query": {
                    "text": "how to authenticate API requests",
                    "modalities": ["text"]
                },
                "limit": 20
            },
            {
                "type": "filter",
                "conditions": {
                    "metadata.doc_type": {"$in": ["guide", "reference"]},
                    "metadata.version": {"$gte": "2.0"}
                }
            },
            {
                "type": "rerank",
                "model": "cross-encoder/ms-marco-MiniLM-L-12-v2",
                "limit": 5
            }
        ]
    )
    
    # Results ranked by semantic similarity
    for result in results:
        print(f"Score: {result.score:.4f}")
        print(f"Source: {result.metadata['filename']}")
        print(f"Content: {result.content[:200]}")

    Frequently Asked Questions

    What is vector search?

    Vector search is a retrieval method that finds content by semantic similarity rather than keyword matching. Content (text, images, audio, video) is converted into numerical vectors called embeddings using machine learning models. At query time, the query is embedded using the same model, and approximate nearest neighbor (ANN) algorithms find the closest vectors in the index. This means a search for 'automobile maintenance' retrieves documents about 'car repair' because they share semantic meaning in the embedding space.

    How does vector search differ from keyword search?

    Keyword search (BM25, TF-IDF) matches exact terms in documents using inverted indexes. It excels at precise lookups but fails when users phrase queries differently than the indexed content. Vector search captures semantic meaning -- it understands that 'fix a flat tire' and 'puncture repair tutorial' are related even though they share no keywords. The tradeoff is that vector search may miss exact identifiers (product SKUs, error codes) that keyword search catches precisely. Many production systems use hybrid search to combine both.

    What embedding models does Mixpeek support for vector search?

    Mixpeek supports 50+ embedding models through its Ray GPU feature extraction pipeline. For text: sentence-transformers (all-MiniLM-L6-v2, all-mpnet-base-v2), E5, BGE, and custom models. For images and video: CLIP (ViT-L/14, ViT-B/32) and SigLIP. For audio: CLAP and Whisper-based transcription embeddings. You configure the embedding model per collection, and Mixpeek handles GPU inference, batching, and scaling automatically.

    What is HNSW and how does it speed up vector search?

    HNSW (Hierarchical Navigable Small World) is an approximate nearest neighbor algorithm that builds a multi-layer graph index over your vectors. Instead of comparing the query against every stored vector (brute force), HNSW traverses the graph from coarse to fine layers, narrowing down candidates at each level. This achieves sub-linear search time -- typically O(log n) -- with 95-99% recall at millisecond latency. Qdrant uses HNSW as its primary index, with tunable parameters (m for graph connectivity, ef for search quality).

    How do vector search benchmarks compare across databases?

    Vector search benchmarks (ann-benchmarks, VectorDBBench) measure recall, queries per second (QPS), and indexing speed. Key factors affecting performance include: HNSW parameters (m, ef_construct), vector dimensionality, dataset size, and whether metadata filtering is applied during search. In practice, the choice of embedding model has a larger impact on search quality than the vector database engine. Mixpeek uses Qdrant, which consistently ranks among the top performers for filtered vector search on standard benchmarks.

    Can I use vector search with Python?

    Yes. Mixpeek provides a Python SDK that handles the full vector search pipeline: create collections with embedding extractors, ingest content that automatically generates vectors, and execute vector search queries with metadata filtering and reranking. The SDK abstracts away embedding model hosting, vector index management, and infrastructure scaling. Install with `pip install mixpeek` and start with a free API key.

    What is the difference between vector search and semantic search?

    Vector search is the underlying retrieval mechanism -- it finds the nearest vectors to a query vector using ANN algorithms. Semantic search is the broader concept of searching by meaning, which typically uses vector search as its core engine but may also include query expansion, reranking, and intent classification. In practice, the terms are often used interchangeably, but vector search specifically refers to the embedding + ANN retrieval step.

    How does Mixpeek handle vector search at scale?

    Mixpeek scales vector search across three dimensions. Embedding generation runs on auto-scaling Ray GPU clusters with batch processing for high-throughput indexing. Vector storage and retrieval uses Qdrant with HNSW indexes, supporting billions of vectors with millisecond latency. Storage tiering moves cold data to S3 Vectors while keeping hot data in Qdrant, optimizing cost without sacrificing query performance for active datasets. The multi-stage retriever pipeline chains vector search with filtering and reranking for production-grade precision.

    Start Building with Vector Search

    Ship semantic search in minutes. Managed embedding generation, HNSW indexing, metadata filtering, and multi-stage retriever pipelines -- no infrastructure to provision.