NEWVector Store Object Storage — 50x cheaper.Read the post →
    6 min read

    We built a vector store on object storage and it's 50x cheaper

    Every vector database forces you to declare dimensions and distance metrics before writing a single vector. Schema-on-write, compute pushdown, and learned indexes fix the three things they got wrong.

    We built a vector store on object storage and it's 50x cheaper
    Vector Database

    Every vector database starts the same way. Before you write a single vector, you declare a collection with a fixed dimensionality, a distance metric, and an index configuration. 768 dimensions, cosine similarity, HNSW with ef_construction=128.

    This worked when embedding models were stable and retrieval was a single API call. But models change constantly, teams run multiple embeddings side by side, and agents need to search across schemas that evolve faster than anyone can plan for. The rigidity that made vector databases simple to start with makes them expensive to operate.

    We built MVS to fix this. It's a vector store designed around three ideas: infer schema from data instead of demanding it upfront, run on object storage so cost scales with bytes rather than memory, and keep reads stable while writes hammer the system.

    • MVS overview — architecture and features
    • Benchmarks — reproducible harness, MVS vs Qdrant vs Turbopuffer
    • Pricing — usage-based, starting free

    Here's the Qdrant way (and the Pinecone way, and the Weaviate way):

    client.create_collection(
        "products",
        vectors_config=VectorParams(size=768, distance=Distance.COSINE)
    )

    You've now committed to 768 dimensions and cosine distance for the lifetime of this collection. When you switch from CLIP to SigLIP next month and your dimensions jump from 768 to 1152, you recreate the collection and re-index everything. I wrote about this rigidity in the 3072 dimension problem and the pattern repeats everywhere.

    In MVS, you create an empty namespace:

    client.create_namespace("products")

    Then you write vectors:

    client.upsert(
        namespace="products",
        points=[{
            "id": "sku-1234",
            "vectors": {
                "clip_v1": [0.12, 0.34, ...],     # 768-dim, auto-detected
                "siglip_v2": [0.56, 0.78, ...],   # 1152-dim, auto-detected
            },
            "payload": {"category": "shoes", "price": 89.99}
        }]
    )

    On first write, the shard infers dimension and metric from the data. Subsequent writes enforce consistency (a dimension mismatch is an error), but you never had to decide upfront. New embedding model? Write vectors with a new name. The old ones stay queryable. No migration, no downtime.

    This sounds minor. In practice it changes how teams ship. Instead of gating embedding model upgrades on a collection migration, you run the new model in parallel and compare retrieval quality side-by-side on the same namespace. The schema follows the data, not the other way around.

    Traditional Vector DB 1. Define: 768-dim, cosine, HNSW 2. Create collection 3. Write vectors 4. Switch to SigLIP (1152-dim)? Recreate collection Re-index everything. Downtime. MVS 1. Create empty namespace 2. Write vectors 3. 768-dim auto-detected 4. Switch to SigLIP (1152-dim)? Write new vectors under new name Both queryable. No migration. vs

    Cheaper at scale

    Most of the cost in a managed vector database is memory. Qdrant, Weaviate, and Pinecone keep vectors in RAM (or fast SSDs that cost nearly as much) because their indexes assume hot storage. That works at small scale. At 100M vectors it costs $2,600/month on Qdrant Cloud.

    MVS stores everything on object storage: PQ-compressed vectors, WAL segments, snapshots. All on GCS or S3, at $0.023/GB/month. Product quantization compresses vectors by 32x (1024-dim float32 down to 128-byte PQ codes), and the compute layer is a shared pool across namespaces instead of dedicated pods per collection.

    • Storage: $3.80/month at 100M vectors (165GB on S3 Standard)
    • Compute: ~$3/namespace/month amortized at 100 tenants
    • Total: ~$6.80/namespace/month vs $2,600 (Qdrant) or $358 (Turbopuffer)

    Single namespace, MVS is roughly 15% cheaper than Turbopuffer. At 100 namespaces, it's 50x cheaper per namespace because the compute pool is shared. Cold start from object storage takes about 2 seconds: download the latest snapshot, mmap the partitions, load the PQ codebook, start serving.

    Full head-to-head, measured on matched hardware:

    MVS Qdrant Turbopuffer
    p95 latency (warm) 52 ms 58 ms 106 ms
    Ingest 100K vectors 39 s 188 s 140 s
    Recall@10 1.00 1.00 1.00
    Cost @ 100M vectors ~$304/mo ~$2,600/mo ~$358/mo
    Per-namespace @ 100 tenants $6.74/mo single-tenant $358/mo
    Cold start ~2 s always-on managed

    100K × 1024-d corpus, matched 4-vCPU GCP instances, cosine similarity. Cost projections extrapolated to 100M vectors.

    mixpeek/mvs-benchmark
    Reproducible benchmark harness for MVS vs Qdrant vs Turbopuffer. Includes corpus generation, recall measurement, and cost projections.

    Why agents need a different kind of vector store

    Traditional apps have a developer who defines the schema, picks an embedding model, and builds a retrieval pipeline that stays roughly static after launch. Agents don't work this way. An agent discovers what it needs at runtime: which embeddings to generate, which filters to apply, which modalities to combine. The vector store has to be flexible enough to handle that without someone reconfiguring it between runs.

    Schema-on-write is the foundation. An agent can start writing a new vector type (say, a summarization embedding alongside the original content embedding) without anyone declaring a new collection or migrating data. The store just accepts it and infers the config. This matters because agentic retrieval workflows evolve constantly. Freezing the schema means freezing the agent's capabilities.

    A few other things that matter for agents:

    • Hybrid search. Agents often need to combine semantic similarity with keyword matching and metadata filters in a single query. MVS supports dense, sparse (SPLADE), and BM25 fusion natively with configurable strategies (RRF, DBSF, weighted). One call, not three.
    • Read stability during writes. Agents that ingest and query simultaneously need the store to handle both without degrading. MVS separates the primary (absorbs writes, ships WAL) from read replicas (poll for sealed segments). A write OOM doesn't take down search.
    • Multi-vector support. ColBERT-style late interaction with per-token embeddings and MaxSim scoring. Agents doing document-level reasoning get better recall without retrieving full documents.
    • GROUP BY and aggregations. Agents reasoning over result sets need more than flat top-K lists. MVS supports grouping, term aggregations, stats, and histograms natively, so an agent can ask "top result per category" in one query instead of post-processing.

    The pattern across all of these: reduce the number of round-trips and decisions the agent has to make. Every capability you push into the store is one less thing the agent has to orchestrate, one less network hop, and one fewer place for the pipeline to break.

    Hybrid Search Dense + sparse + BM25 in a single call. RRF, DBSF, or weighted fusion. One query, not three. Read Stability Writes go to primary, reads hit replicas. Primary OOM doesn’t crash search. Ingest and query at the same time. Multi-Vector (ColBERT) Per-token embeddings with MaxSim. Better recall without full doc retrieval. Late interaction, native. Native Aggregations GROUP BY, term stats, histograms in the query, not post-processing. Agents reason over sets, not flat lists.

    One thing we're working toward that's worth mentioning: learned indexes. MVS currently uses LIRE, a continuously-rebalancing partitioned index with geometric centroids. It gets recall@10 of 0.90 on our 10M-vector nightly benchmark. But geometric centroids don't know which regions of the vector space your users actually search. They distribute data evenly, not usefully.

    The next step is HILL (Hierarchical Index Learning, based on Meta's EDBT '26 paper), which replaces geometric centroids with learned centroids trained on query interaction signals. On our benchmarks, HILL reaches LIRE's recall at roughly one quarter the candidate budget. Same quality, 4x fewer vectors scanned. Each namespace trains its own codebook, so the index becomes a trained artifact encoding that customer's usage patterns. Early days, but this is where the structural advantage of owning the storage layer pays off.

    Next on the roadmap: bring-your-own object storage. MVS already stores everything (PQ-compressed vectors, WAL segments, snapshots) on S3-compatible backends. The missing piece is letting you point it at your bucket, whether that's AWS S3, Backblaze B2, Tigris, Cloudflare R2, or anything else that speaks the S3 protocol. Your vectors stay in your account, on your infrastructure, under your IAM policies. We keep the compute layer; you keep the data.