A high-performance graph-based ANN algorithm that builds a multi-layered navigable small world graph for fast similarity search. HNSW is the default index type in most modern vector databases used for multimodal retrieval.
How It Works
HNSW constructs a hierarchical graph where each layer is a navigable small world network with decreasing density. The top layers contain few nodes with long-range connections for fast coarse navigation, while the bottom layer contains all nodes with short-range connections for precise search. Query traversal starts at the top layer and greedily descends, narrowing the search region at each level.
Technical Details
Key parameters include M (number of connections per node, typically 16-64), ef_construction (search width during build, typically 100-500), and ef_search (search width during query, typically 50-500). Higher M and ef values improve recall but increase memory and latency. The algorithm achieves logarithmic search complexity with respect to dataset size, making it efficient even at billion-scale.
Best Practices
Start with M=16 and ef_construction=200 as reasonable defaults, then tune based on benchmarks
Set ef_search dynamically based on the required recall level for each query
Use separate read and write threads since HNSW supports concurrent reads but serializes writes
Monitor memory usage carefully as HNSW indices can be 1.5-2x the raw vector data size
Common Pitfalls
Setting M too low, resulting in disconnected graph regions and poor recall
Using ef_search lower than the number of requested results k
Not reserving enough memory for the graph structure on top of vector storage
Expecting real-time updates at high throughput without understanding write serialization
Advanced Tips
Combine HNSW with product quantization to reduce memory while maintaining graph navigation quality
Use HNSW with scalar quantization (SQ8) for a good balance of memory savings and recall
Implement filtered HNSW search by pruning graph traversal based on metadata predicates
Consider segment-based HNSW implementations that allow efficient deletions and updates