A horizontal scaling strategy that partitions data across multiple nodes, each holding a subset of the total data. Sharding enables multimodal search systems to handle datasets that exceed single-machine capacity while maintaining query performance.
Sharding divides a large dataset into smaller partitions (shards), each stored on a separate node. A sharding key determines which shard receives each piece of data. Queries are routed to the relevant shards, executed in parallel, and results are merged. This distributes both storage and compute load across the cluster, enabling horizontal scaling beyond single-node limits.
Common sharding strategies include hash-based (consistent hashing on a key), range-based (partitioning by value ranges), and directory-based (lookup table mapping). Vector databases like Qdrant and Milvus support automatic sharding of vector collections. MongoDB uses shard keys to distribute documents across shards. Replication within each shard provides fault tolerance. Query routing can be client-side, proxy-based, or handled by the database itself.