Product Quantization - Vector compression by subspace decomposition and codebook encoding
A vector compression technique that decomposes high-dimensional vectors into lower-dimensional subspaces and quantizes each independently, achieving high compression ratios for large-scale multimodal search systems.
How It Works
Product quantization splits each D-dimensional vector into M subvectors of D/M dimensions. Each subspace has its own codebook of k centroids trained via k-means. A vector is encoded as M centroid indices, reducing storage from D floats to M bytes (when k=256). Distance computation uses precomputed lookup tables between query subvectors and codebook entries.
Technical Details
Standard PQ uses M=8-64 subspaces with k=256 centroids each, compressing 768-dimensional float32 vectors from 3KB to 8-64 bytes. Asymmetric Distance Computation (ADC) keeps the query uncompressed and computes distances against quantized database vectors for better accuracy. Inverted File with PQ (IVF-PQ) combines coarse partitioning with PQ for scalable search.
Best Practices
Choose the number of subspaces M based on your acceptable recall-compression trade-off
Train codebooks on at least 10x-100x more vectors than the codebook size
Use Optimized Product Quantization (OPQ) to rotate vectors before quantization
Always use asymmetric distance computation for queries to maximize search accuracy
Common Pitfalls
Using too few training vectors for codebook learning, leading to poor quantization
Selecting subspace dimensions that do not evenly divide the vector dimensionality
Forgetting to retrain codebooks when switching to a different embedding model
Applying PQ to very low-dimensional vectors where the compression benefit is minimal
Advanced Tips
Chain residual quantization after PQ to encode quantization errors for higher fidelity
Use additive quantization methods for better accuracy at the same compression ratio
Implement SIMD-optimized distance table lookups for maximum query throughput
Combine PQ with graph-based indices (HNSW-PQ) for memory-efficient billion-scale search