We built a vector store on object storage and it's 50x cheaper

Every vector database starts the same way. Before you write a single vector, you declare a collection with a fixed dimensionality, a distance metric, and an index configuration. 768 dimensions, cosine similarity, HNSW with ef_construction=128.

This worked when embedding models were stable and retrieval was a single API call. But models change constantly, teams run multiple embeddings side by side, and agents need to search across schemas that evolve faster than anyone can plan for. The rigidity that made vector databases simple to start with makes them expensive to operate.

We built MVS to fix this. It's a vector store designed around three ideas: infer schema from data instead of demanding it upfront, run on object storage so cost scales with bytes rather than memory, and keep reads stable while writes hammer the system.

Create Namespace - First 1M Vectors Free

MVS overview — architecture and features
Benchmarks — reproducible harness, MVS vs Qdrant vs Turbopuffer
Pricing — usage-based, starting free

Here's the Qdrant way (and the Pinecone way, and the Weaviate way):

client.create_collection(
    "products",
    vectors_config=VectorParams(size=768, distance=Distance.COSINE)
)

You've now committed to 768 dimensions and cosine distance for the lifetime of this collection. When you switch from CLIP to SigLIP next month and your dimensions jump from 768 to 1152, you recreate the collection and re-index everything. I wrote about this rigidity in the 3072 dimension problem and the pattern repeats everywhere.

In MVS, you create an empty namespace:

client.create_namespace("products")

Then you write vectors:

client.upsert(
    namespace="products",
    points=[{
        "id": "sku-1234",
        "vectors": {
            "clip_v1": [0.12, 0.34, ...],     # 768-dim, auto-detected
            "siglip_v2": [0.56, 0.78, ...],   # 1152-dim, auto-detected
        },
        "payload": {"category": "shoes", "price": 89.99}
    }]
)

On first write, the shard infers dimension and metric from the data. Subsequent writes enforce consistency (a dimension mismatch is an error), but you never had to decide upfront. New embedding model? Write vectors with a new name. The old ones stay queryable. No migration, no downtime.

This sounds minor. In practice it changes how teams ship. Instead of gating embedding model upgrades on a collection migration, you run the new model in parallel and compare retrieval quality side-by-side on the same namespace. The schema follows the data, not the other way around.

Cheaper at scale

Most of the cost in a managed vector database is memory. Qdrant, Weaviate, and Pinecone keep vectors in RAM (or fast SSDs that cost nearly as much) because their indexes assume hot storage. That works at small scale. At 100M vectors it costs $2,600/month on Qdrant Cloud.

MVS stores everything on object storage: PQ-compressed vectors, WAL segments, snapshots. All on GCS or S3, at $0.023/GB/month. Product quantization compresses vectors by 32x (1024-dim float32 down to 128-byte PQ codes), and the compute layer is a shared pool across namespaces instead of dedicated pods per collection.

Storage: $3.80/month at 100M vectors (165GB on S3 Standard)
Compute: ~$3/namespace/month amortized at 100 tenants
Total: ~$6.80/namespace/month vs $2,600 (Qdrant) or $358 (Turbopuffer)

Single namespace, MVS is roughly 15% cheaper than Turbopuffer. At 100 namespaces, it's 50x cheaper per namespace because the compute pool is shared. Cold start from object storage takes about 2 seconds: download the latest snapshot, mmap the partitions, load the PQ codebook, start serving.

Full head-to-head, measured on matched hardware:

	MVS	Qdrant	Turbopuffer
p95 latency (warm)	52 ms	58 ms	106 ms
Ingest 100K vectors	39 s	188 s	140 s
Recall@10	1.00	1.00	1.00
Cost @ 100M vectors	~$304/mo	~$2,600/mo	~$358/mo
Per-namespace @ 100 tenants	$6.74/mo	single-tenant	$358/mo
Cold start	~2 s	always-on	managed

100K × 1024-d corpus, matched 4-vCPU GCP instances, cosine similarity. Cost projections extrapolated to 100M vectors.

mixpeek/mvs-benchmark

Reproducible benchmark harness for MVS vs Qdrant vs Turbopuffer. Includes corpus generation, recall measurement, and cost projections.

GitHub

Why agents need a different kind of vector store

Traditional apps have a developer who defines the schema, picks an embedding model, and builds a retrieval pipeline that stays roughly static after launch. Agents don't work this way. An agent discovers what it needs at runtime: which embeddings to generate, which filters to apply, which modalities to combine. The vector store has to be flexible enough to handle that without someone reconfiguring it between runs.

Schema-on-write is the foundation. An agent can start writing a new vector type (say, a summarization embedding alongside the original content embedding) without anyone declaring a new collection or migrating data. The store just accepts it and infers the config. This matters because agentic retrieval workflows evolve constantly. Freezing the schema means freezing the agent's capabilities.

A few other things that matter for agents:

Hybrid search. Agents often need to combine semantic similarity with keyword matching and metadata filters in a single query. MVS supports dense, sparse (SPLADE), and BM25 fusion natively with configurable strategies (RRF, DBSF, weighted). One call, not three.
Read stability during writes. Agents that ingest and query simultaneously need the store to handle both without degrading. MVS separates the primary (absorbs writes, ships WAL) from read replicas (poll for sealed segments). A write OOM doesn't take down search.
Multi-vector support. ColBERT-style late interaction with per-token embeddings and MaxSim scoring. Agents doing document-level reasoning get better recall without retrieving full documents.
GROUP BY and aggregations. Agents reasoning over result sets need more than flat top-K lists. MVS supports grouping, term aggregations, stats, and histograms natively, so an agent can ask "top result per category" in one query instead of post-processing.

The pattern across all of these: reduce the number of round-trips and decisions the agent has to make. Every capability you push into the store is one less thing the agent has to orchestrate, one less network hop, and one fewer place for the pipeline to break.

One thing we're working toward that's worth mentioning: learned indexes. MVS currently uses LIRE, a continuously-rebalancing partitioned index with geometric centroids. It gets recall@10 of 0.90 on our 10M-vector nightly benchmark. But geometric centroids don't know which regions of the vector space your users actually search. They distribute data evenly, not usefully.

The next step is HILL (Hierarchical Index Learning, based on Meta's EDBT '26 paper), which replaces geometric centroids with learned centroids trained on query interaction signals. On our benchmarks, HILL reaches LIRE's recall at roughly one quarter the candidate budget. Same quality, 4x fewer vectors scanned. Each namespace trains its own codebook, so the index becomes a trained artifact encoding that customer's usage patterns. Early days, but this is where the structural advantage of owning the storage layer pays off.

Next on the roadmap: bring-your-own object storage. MVS already stores everything (PQ-compressed vectors, WAL segments, snapshots) on S3-compatible backends. The missing piece is letting you point it at your bucket, whether that's AWS S3, Backblaze B2, Tigris, Cloudflare R2, or anything else that speaks the S3 protocol. Your vectors stay in your account, on your infrastructure, under your IAM policies. We keep the compute layer; you keep the data.

MVS overview — architecture, features, and how it works
Pricing — usage-based, starting free
Benchmarks — reproducible harness, MVS vs Qdrant vs Turbopuffer
Create a namespace — start writing vectors in 30 seconds