~8ms hot search, 50K+ writes/s, and 10K+ queries/s in prod
Cost calculatorEstimate your monthly MVS bill based on vector count, dimensions, and usage. All pricing is pay-as-you-go with no upfront commitments.
Vector dimensionsThe size of each embedding vector. Higher dimensions capture more nuance but use more storage. Common models: 384 (MiniLM), 768 (BERT), 1536 (OpenAI ada-002).
Number of vectorsTotal documents stored across all namespaces. Each document is a vector embedding plus its metadata payload.
1M1 shard
1M10M100M1B10B
StorageObject storage cost for persisting vector data and metadata. Priced at ~$0.023/GB/mo -- the same rate as S3 Standard.$0.01
MemoryRAM used for hot caching frequently accessed vectors and indexes. Enables sub-10ms query latency on warm namespaces.$0.00
WritesCost of upsert operations. Includes WAL logging, index updates, and replication to object storage.$0.00
QueriesSearch queries against your namespaces. First 100K queries per month are included free with every plan.100K/mo included
1 shardsShards partition your data across multiple Rust workers for parallel query execution. MVS auto-scales shards as your dataset grows.1,000 namespacesNamespaces are isolated collections within your account. Use them for multi-tenancy, A/B testing, or separating data by environment.
WorkloadBenchmark scenario: dense ANN search over 768-dimensional vectors with top_k=10 on the configured number of shards.768 dimensions, 1M docs, ~500 MB
p50Median latency -- 50% of queries complete faster than this. Represents the typical user experience.
8ms543ms
p9090th percentile -- only 10% of queries are slower. A good measure of consistent performance.
10ms612ms
p9999th percentile (tail latency) -- the worst 1% of queries. Critical for SLA guarantees and real-time apps.
35ms854ms
Warm namespaceData is cached in memory/SSD. Queries hit the hot cache and return in single-digit milliseconds. This is the default for actively queried namespaces.
Cold namespaceData lives in object storage and must be fetched on demand. First query warms the cache -- subsequent queries are fast. Ideal for rarely accessed data at minimal cost.
Approach (1 shards with top_k=10)
View full benchmark methodology & resultsScale Tiers
| Vectors | Shards | RAM | Obj. Storage | MVS/mo | Savings |
|---|---|---|---|---|---|
| 1M | 1 | 30 MB | 480 MB | Free | Free |
| 10M | 1 | 300 MB | 4.8 GB | $80 | 68%Competitor pricingQdrant$500Pinecone$700Weaviate$250MVS$80 |
| 100M | 10 | 3.0 GB | 48 GB | $800 | 77%Competitor pricingQdrant$5,000Pinecone$7,000Weaviate$3,500MVS$800 |
| 1B | 100 | 30 GB | 480 GB | $3,500 | 92%Competitor pricingQdrant$75,000Pinecone$80,000Weaviate$45,000MVS$3,500 |
| 5B | 500 | 75 GB | 2.4 TB | $15,000 | Only MVS |
| 10B | 1,000 | 150 GB | 4.8 TB | $30,000 | Only MVS |
Feature Comparison
MVS vs the leading vector databases. Rows highlighted in purple are MVS-exclusive capabilities.
| Capability | Pinecone | Qdrant | Turbopuffer | MVS |
|---|---|---|---|---|
| Search | ||||
| Dense vector search (ANN)Approximate nearest neighbor search over high-dimensional embeddings. The foundation of semantic search -- find results by meaning, not keywords. | ||||
| Sparse vector searchSearch using sparse vectors like SPLADE or learned sparse embeddings. Captures keyword-level signals that dense vectors miss. | ||||
| BM25 full-text searchClassic keyword search built on an inverted index. MVS uses Tantivy natively -- no workarounds or external engines needed. | SPLADE workaround | Native Tantivy | ||
| Multi-dense (ColBERT)Late-interaction retrieval that stores per-token embeddings for higher recall. Enables token-level matching without collapsing to a single vector. | ||||
| Hybrid search (RRF/DBSF fusion)Combine dense, sparse, and keyword results into a single ranked list using Reciprocal Rank Fusion or Distribution-Based Score Fusion. | ||||
| Multi-stage retrieval pipelinesChain retrieval stages -- e.g. broad recall with ANN, then re-rank with a cross-encoder -- in a single query. Reduces latency vs round-trips. | ||||
| Standing queries (push on match)Register a persistent query that fires a webhook whenever a newly ingested document matches. Useful for alerting, monitoring, and real-time feeds. | ||||
| Semantic JOINs across namespacesJoin two namespaces by vector similarity -- like a SQL JOIN but on embeddings. No denormalization or data duplication required. | ||||
| Data Operations | ||||
| Aggregation (GROUP BY, COUNT, SUM, AVG)Run analytics directly on your vector store. Group documents by metadata fields and compute counts, averages, sums -- no ETL to a data warehouse. | ||||
| Cross-shard transactions (2PC)Atomic writes across multiple shards using two-phase commit. Ensures all-or-nothing consistency even at billion-scale datasets. | ||||
| Optimistic concurrency (_version)Prevent write conflicts with version-based optimistic locking. Critical for multi-writer workloads where two processes might update the same document. | ||||
| Change streams (WAL-tailing, SSE)Subscribe to real-time insert/update/delete events via Server-Sent Events. Build reactive pipelines without polling your database. | ||||
| Time-travel queries (WAL replay)Query your data as it existed at a past point in time by replaying the write-ahead log. Useful for debugging, auditing, and reproducibility. | ||||
| Document version historyEvery mutation is versioned. Roll back a document to any prior state or diff two versions to see exactly what changed. | ||||
| Query audit logFull audit trail of every query executed -- who ran it, when, and what was returned. Essential for compliance and debugging in production. | ||||
| Reliability & Governance | ||||
| Storage tiering (hot/cold/archive)Automatically move infrequently accessed data from memory/SSD to object storage. Cut costs without manual data management. | Automatic, object storage-backed | |||
| Retention policiesSet TTLs on documents or namespaces. Data is automatically purged after the retention window -- no cron jobs or manual cleanup. | ||||
| Namespace catalog (INFORMATION_SCHEMA)Discover all namespaces, their schemas, row counts, and storage usage via a system catalog. Like INFORMATION_SCHEMA in SQL databases. | ||||
| Multi-tenant isolation (noisy neighbor)Resource isolation between tenants prevents one workload from starving others. Each namespace has independent rate limits and resource quotas. | ||||
| Priority lanes (QoS scheduling)Assign CRITICAL/NORMAL/BACKGROUND/BULK priority to requests. Higher-priority queries get reserved compute slots and preempt lower-priority work in the shard queue. | ||||
| Idempotent operationsEvery write accepts an idempotency key. Retries from crashes or network timeouts are automatically deduplicated -- no duplicate documents, no double-counted aggregations. | ||||
| Distributed execution tracesFull distributed trace for every query -- coordinator routing, per-shard timing, filter selectivity, index hits. Debug multi-hop requests across the entire fan-out path. | ||||
| Agentic Workloads | ||||
| Streaming partial results (SSE)Get results as shards respond instead of waiting for all shards. Agents evaluate early hits and decide whether to refine or cancel -- the tight feedback loop pattern that defines agentic retrieval. | ||||
| Query cancellation (cooperative termination)Cancel in-flight fan-out queries that are no longer needed. When an agent fires 5 parallel searches and gets an answer from the first, the other 4 are terminated at the shard level, freeing compute instantly. | ||||
| Per-agent budget limitsEnforce max queries, writes, and compute per agent or API key at the coordinator level. Prevents runaway autonomous loops -- the specific failure mode where an LLM in a loop issues unbounded queries. | ||||
| Infrastructure | ||||
| Object storage-native (no separate DB to manage)Data lives in your object storage (S3, GCS, Azure Blob). No separate database cluster to provision, back up, or scale -- just point MVS at your bucket. | ||||
| Self-hosted optionDeploy MVS in your own VPC or on-prem. Full control over data residency, network policies, and infrastructure -- no vendor lock-in. | OSS | |||
API Examples
Capabilities you will not find in any other vector database.
Write documents with dense, sparse, and metadata in a single call.
from mixpeek import Mixpeekclient = Mixpeek(api_key="YOUR_KEY")client.namespaces.upsert(namespace="products",documents=[{"id": "doc-001","dense_embedding": [0.12, -0.34, ...], # 768-d"sparse_embedding": {"tokens": [1204, 879], "weights": [0.9, 0.4]},"metadata": {"category": "electronics", "price": 299.99},"text": "Noise-cancelling wireless headphones"}])
Write documents with dense, sparse, and metadata in a single call.
from mixpeek import Mixpeekclient = Mixpeek(api_key="YOUR_KEY")client.namespaces.upsert(namespace="products",documents=[{"id": "doc-001","dense_embedding": [0.12, -0.34, ...], # 768-d"sparse_embedding": {"tokens": [1204, 879], "weights": [0.9, 0.4]},"metadata": {"category": "electronics", "price": 299.99},"text": "Noise-cancelling wireless headphones"}])
