Mixpeek Logo

    What is Vector Quantization

    Vector Quantization - Compressing vectors by mapping to codebook entries

    A compression technique that maps high-dimensional vectors to a finite set of representative codewords, reducing storage and speeding up similarity search. Essential for scaling multimodal vector databases to billions of embeddings.

    How It Works

    Vector quantization partitions the embedding space into regions, each represented by a centroid (codeword). During encoding, each vector is assigned to its nearest centroid, and only the centroid index is stored instead of the full vector. At query time, distances are computed between the query and the centroids rather than all original vectors, dramatically reducing computation and memory.

    Technical Details

    Common approaches include k-means based scalar quantization, product quantization (PQ) that splits vectors into subspaces, and residual quantization that iteratively encodes quantization errors. Codebook sizes typically range from 256 to 65536 entries. The trade-off between compression ratio and recall accuracy is controlled by the number of subspaces and codebook size.

    Best Practices

    • Train codebooks on a representative sample of your actual data distribution
    • Use product quantization for high-dimensional vectors to maintain accuracy with high compression
    • Benchmark recall at different compression levels to find the right trade-off for your use case
    • Re-rank top candidates using original vectors after initial quantized search

    Common Pitfalls

    • Training codebooks on data that does not represent the production distribution
    • Over-compressing vectors, leading to unacceptable recall degradation
    • Not re-training codebooks when the data distribution shifts significantly
    • Applying uniform quantization to vectors that have non-uniform value distributions

    Advanced Tips

    • Combine vector quantization with graph-based indices (HNSW) for fast approximate search
    • Use optimized product quantization (OPQ) to rotate vectors before quantization for better accuracy
    • Implement asymmetric distance computation where queries remain unquantized for higher precision
    • Consider learned quantization methods that train end-to-end with the retrieval objective