Mixpeek Logo

    What is Product Quantization

    Product Quantization - Vector compression by subspace decomposition and codebook encoding

    A vector compression technique that decomposes high-dimensional vectors into lower-dimensional subspaces and quantizes each independently, achieving high compression ratios for large-scale multimodal search systems.

    How It Works

    Product quantization splits each D-dimensional vector into M subvectors of D/M dimensions. Each subspace has its own codebook of k centroids trained via k-means. A vector is encoded as M centroid indices, reducing storage from D floats to M bytes (when k=256). Distance computation uses precomputed lookup tables between query subvectors and codebook entries.

    Technical Details

    Standard PQ uses M=8-64 subspaces with k=256 centroids each, compressing 768-dimensional float32 vectors from 3KB to 8-64 bytes. Asymmetric Distance Computation (ADC) keeps the query uncompressed and computes distances against quantized database vectors for better accuracy. Inverted File with PQ (IVF-PQ) combines coarse partitioning with PQ for scalable search.

    Best Practices

    • Choose the number of subspaces M based on your acceptable recall-compression trade-off
    • Train codebooks on at least 10x-100x more vectors than the codebook size
    • Use Optimized Product Quantization (OPQ) to rotate vectors before quantization
    • Always use asymmetric distance computation for queries to maximize search accuracy

    Common Pitfalls

    • Using too few training vectors for codebook learning, leading to poor quantization
    • Selecting subspace dimensions that do not evenly divide the vector dimensionality
    • Forgetting to retrain codebooks when switching to a different embedding model
    • Applying PQ to very low-dimensional vectors where the compression benefit is minimal

    Advanced Tips

    • Chain residual quantization after PQ to encode quantization errors for higher fidelity
    • Use additive quantization methods for better accuracy at the same compression ratio
    • Implement SIMD-optimized distance table lookups for maximum query throughput
    • Combine PQ with graph-based indices (HNSW-PQ) for memory-efficient billion-scale search