NEWVectors or files. Pick a path.Start →

    What is Product Quantization

    Product Quantization - Vector compression by subspace decomposition and codebook encoding

    A vector compression technique that decomposes high-dimensional vectors into lower-dimensional subspaces and quantizes each independently, achieving high compression ratios for large-scale multimodal search systems.

    How It Works

    Product quantization splits each D-dimensional vector into M subvectors of D/M dimensions. Each subspace has its own codebook of k centroids trained via k-means. A vector is encoded as M centroid indices, reducing storage from D floats to M bytes (when k=256). Distance computation uses precomputed lookup tables between query subvectors and codebook entries.

    Technical Details

    Standard PQ uses M=8-64 subspaces with k=256 centroids each, compressing 768-dimensional float32 vectors from 3KB to 8-64 bytes. Asymmetric Distance Computation (ADC) keeps the query uncompressed and computes distances against quantized database vectors for better accuracy. Inverted File with PQ (IVF-PQ) combines coarse partitioning with PQ for scalable search.

    Best Practices

    • Choose the number of subspaces M based on your acceptable recall-compression trade-off
    • Train codebooks on at least 10x-100x more vectors than the codebook size
    • Use Optimized Product Quantization (OPQ) to rotate vectors before quantization
    • Always use asymmetric distance computation for queries to maximize search accuracy

    Common Pitfalls

    • Using too few training vectors for codebook learning, leading to poor quantization
    • Selecting subspace dimensions that do not evenly divide the vector dimensionality
    • Forgetting to retrain codebooks when switching to a different embedding model
    • Applying PQ to very low-dimensional vectors where the compression benefit is minimal

    Advanced Tips

    • Chain residual quantization after PQ to encode quantization errors for higher fidelity
    • Use additive quantization methods for better accuracy at the same compression ratio
    • Implement SIMD-optimized distance table lookups for maximum query throughput
    • Combine PQ with graph-based indices (HNSW-PQ) for memory-efficient billion-scale search
    Managed Mixpeek

    Put multimodal search to work

    Connect a bucket and Mixpeek runs the whole multimodal search pipeline for you: extraction, indexing, and search over your own objects. No models to wire up, nothing to host.

    Start with Managed
    MVS · bring your own

    Already have vectors?

    Keep your embeddings on your own cloud and run dense, sparse, and BM25 search directly on object storage. First 1M vectors free.

    Start with MVS