Best S3-Compatible Object Storage for AI Workloads in 2026

We tested 7 S3-compatible object storage providers for AI and ML workloads — measuring throughput, latency, cost per TB, and compatibility with vector databases and embedding pipelines. Every provider tested with MVS (Mixpeek Vector Store), which runs on any S3-compatible backend.

Last tested: March 28, 2026

11 tools evaluated

Mixpeek Vector Store (MVS) runs on any S3-compatible backend — layer vector search directly on top of your existing object storage without moving data.

Try Mixpeek Storage

Quick Answer

The best overall option in this category is Backblaze B2, especially for cost-conscious ai teams storing embeddings, model weights, and media files. pairs with mvs for vector search at a fraction of s3 cost. The rankings below compare each tool by strengths, limitations, pricing, and fit for production use.

Backblaze B2

Best for cost-conscious ai teams storing embeddings, model weights, and media files. pairs with mvs for vector search at a fraction of s3 cost.

Cloudflare R2

Best for read-heavy ai workloads (rag, retrieval, inference serving) where zero egress fees offset higher storage costs.

AWS S3

Best for teams already on aws that need tight integration with sagemaker, bedrock, or lambda — or that want to use s3 vectors for basic vector search.

How We Evaluated

Cost per TB

30%

Storage cost, egress fees, and API request pricing. For AI workloads, egress and GET request costs often dominate — not just storage.

S3 Compatibility

25%

Completeness of S3 API support. Tested multipart uploads, presigned URLs, lifecycle policies, and compatibility with MVS, MinIO, and boto3.

Performance

20%

Upload throughput, download latency, and time-to-first-byte for large objects (embeddings, model weights, media files).

AI Ecosystem Fit

15%

Integration with AI tools: works as a backend for vector databases (MVS, LanceDB), model registries, dataset versioning, and training pipelines.

Operational Simplicity

10%

Setup time, dashboard quality, documentation, and support responsiveness.

Overview

Choosing the right object storage for AI workloads is not just about $/TB — it is about total cost of ownership when you factor in egress, API calls, and how well the storage layer integrates with your embedding and inference pipelines. We ran identical benchmarks across all seven providers: uploading 10 TB of mixed embeddings and media files, running retrieval workloads at 500 QPS, and measuring end-to-end latency through MVS. The results reveal that the cheapest storage price rarely translates to the cheapest total cost, and that ecosystem fit — whether your storage can serve as a native backend for vector search — matters more than raw throughput for most AI teams.

Backblaze B2

The best balance of cost, reliability, and S3 compatibility for AI workloads. B2 is 1/4 the price of AWS S3 with free egress to Cloudflare and Fastly CDN partners. Tested as an MVS backend — Mixpeek's vector store runs directly on B2, giving you vector search on top of your existing B2 storage without moving data.

What Sets It Apart

Lowest cost per TB of any mainstream provider with free CDN egress via Bandwidth Alliance — making it the default choice for storage-heavy AI workloads that serve data through Cloudflare.

Strengths

+Cheapest mainstream storage at $6/TB/mo (vs $23/TB on S3)
+Free egress to Cloudflare, Fastly, and other CDN partners
+Full S3 compatibility — works with MVS, boto3, rclone, everything
+Proven reliability (500B+ objects stored) with 11 nines durability

Limitations

-Single-region only (US-West-004 or EU-Central-003)
-No serverless compute integration like Lambda@Edge
-Rate limits on free egress via CDN partners
-Smaller ecosystem than AWS for adjacent services

Real-World Use Cases

•Storing and serving 100M+ embedding vectors via MVS with sub-$1K/mo storage costs
•Backing up model checkpoints during distributed training with free CDN egress for model serving
•Hosting large multimodal datasets (images, video, audio) for feature extraction pipelines
•Building a cost-effective RAG knowledge base with vector search layered on top via Mixpeek

Choose This When

When your AI workload is storage-heavy (tens of TB of embeddings, datasets, or model artifacts) and you want the lowest possible cost without sacrificing S3 compatibility or durability.

Skip This If

When you need multi-region replication, serverless compute triggers on storage events, or your workload requires sub-10ms access latency that only block storage can deliver.

Integration Example

import boto3

b2 = boto3.client(
    "s3",
    endpoint_url="https://s3.us-west-004.backblazeb2.com",
    aws_access_key_id="YOUR_B2_KEY_ID",
    aws_secret_access_key="YOUR_B2_APP_KEY",
)

# Upload embeddings file to B2
b2.upload_file("my-ai-bucket", "embeddings/batch_001.parquet", "/tmp/batch_001.parquet")

# Generate presigned URL for model weight download
url = b2.generate_presigned_url(
    "get_object",
    Params={"Bucket": "my-ai-bucket", "Key": "models/v2/weights.safetensors"},
    ExpiresIn=3600,
)

$6/TB/mo storage; $0.01/1K API calls; free egress to CDN partners, $0.01/GB otherwise

Best for: Cost-conscious AI teams storing embeddings, model weights, and media files. Pairs with MVS for vector search at a fraction of S3 cost

Visit Website

Cloudflare R2

Zero egress fees — period. R2 is the strongest choice for read-heavy AI workloads (retrieval, inference serving, RAG) where egress costs would otherwise dominate your bill. Fully S3-compatible and works as an MVS backend for BYO vector search.

What Sets It Apart

Zero egress fees with no caps, no reasonable-use policies, and no CDN partner requirements — the only provider where read-heavy workloads do not incur transfer costs regardless of volume.

Strengths

+Zero egress fees — game-changing for retrieval-heavy workloads
+Workers integration for serverless compute at the edge
+Full S3 API compatibility — tested with MVS, LanceDB, DuckDB
+Automatic multi-region replication

Limitations

-$15/TB/mo storage — more expensive than B2 or Wasabi
-No lifecycle policies for automatic tiering (yet)
-Rate limits on free tier (10M reads/mo, 1M writes/mo)
-Less mature than S3 for large-scale batch operations

Real-World Use Cases

•Serving model weights to inference endpoints across multiple regions without egress penalties
•Hosting a RAG document store where every retrieval query pulls data — zero egress keeps costs flat
•Running edge AI inference with Cloudflare Workers reading model artifacts from R2
•Building a global image search API where search results serve original media from R2 at zero transfer cost

Choose This When

When your AI workload reads far more data than it writes — retrieval, inference serving, RAG, or any pattern where egress would be your largest cost on other providers.

Skip This If

When you need versioning, object lock, or lifecycle tiering — R2 lacks these features. Also avoid if storage volume is very large and access is infrequent, since $15/TB/mo storage cost exceeds cheaper alternatives.

Integration Example

import boto3

r2 = boto3.client(
    "s3",
    endpoint_url="https://YOUR_ACCOUNT_ID.r2.cloudflarestorage.com",
    aws_access_key_id="YOUR_R2_ACCESS_KEY",
    aws_secret_access_key="YOUR_R2_SECRET_KEY",
)

# Upload training data — zero egress when models pull it later
r2.upload_file("ai-data", "datasets/train.parquet", "/tmp/train.parquet")

# List model artifacts
response = r2.list_objects_v2(Bucket="ai-data", Prefix="models/v3/")
for obj in response.get("Contents", []):
    print(f"{obj['Key']} — {obj['Size'] / 1e9:.1f} GB")

$15/TB/mo storage; zero egress; $4.50/1M Class A ops, $0.36/1M Class B ops

Best for: Read-heavy AI workloads (RAG, retrieval, inference serving) where zero egress fees offset higher storage costs

Visit Website

AWS S3

The default choice and the most battle-tested object storage on the planet. Unmatched ecosystem integration (Lambda, SageMaker, Bedrock, S3 Vectors). Higher cost than alternatives but offers capabilities nobody else has — including native S3 Vectors for vector search directly in your bucket.

What Sets It Apart

The deepest ecosystem integration of any cloud provider — Lambda triggers, SageMaker pipelines, Bedrock model hosting, Athena analytics, and S3 Vectors all work natively with S3 without additional infrastructure.

Strengths

+Deepest ecosystem integration — Lambda, SageMaker, Bedrock, EMR
+S3 Vectors: native vector search within S3 (new, ~100ms latency)
+Intelligent Tiering automates hot/cold lifecycle
+11 nines durability with cross-region replication

Limitations

-$23/TB/mo storage — 4x more expensive than B2
-Egress fees add up fast ($90/TB)
-S3 Vectors still limited (no hybrid search, no filtering)
-Complexity tax: IAM policies, VPC endpoints, encryption configs

Real-World Use Cases

•End-to-end ML pipelines with SageMaker reading training data from S3 and writing model artifacts back
•Event-driven feature extraction using S3 event notifications triggering Lambda or Step Functions
•Using S3 Vectors for lightweight vector similarity search without deploying a separate vector database
•Multi-tier storage for ML artifacts — hot models on Standard, archived checkpoints on Glacier

Choose This When

When you are already invested in AWS and need tight integration with SageMaker, Lambda, or Bedrock — or when you need advanced features like S3 Vectors, Intelligent Tiering, or cross-region replication.

Skip This If

When cost is the primary concern — S3 is 4x more expensive than B2 for storage and egress fees compound quickly on read-heavy workloads. Avoid for large-scale storage-only use cases where ecosystem integration is not needed.

Integration Example

import boto3

s3 = boto3.client("s3")

# Upload training dataset with Intelligent Tiering
s3.upload_file(
    "/tmp/dataset.parquet",
    "ml-pipeline",
    "datasets/v2/train.parquet",
    ExtraArgs={"StorageClass": "INTELLIGENT_TIERING"},
)

# Set up event notification for new uploads
s3.put_bucket_notification_configuration(
    Bucket="ml-pipeline",
    NotificationConfiguration={
        "LambdaFunctionConfigurations": [{
            "LambdaFunctionArn": "arn:aws:lambda:us-east-1:123:function:process-upload",
            "Events": ["s3:ObjectCreated:*"],
            "Filter": {"Key": {"FilterRules": [{"Name": "prefix", "Value": "datasets/"}]}},
        }]
    },
)

$23/TB/mo (Standard); egress $0.09/GB; Intelligent Tiering available

Best for: Teams already on AWS that need tight integration with SageMaker, Bedrock, or Lambda — or that want to use S3 Vectors for basic vector search

Visit Website

Tigris

Globally distributed, S3-compatible object storage built on FoundationDB. Data automatically replicates to the region closest to your users. Newest entrant on this list but technically impressive — designed from scratch for modern workloads.

What Sets It Apart

Automatic global data distribution with strong consistency — data is replicated to edge locations based on access patterns without any manual configuration or multi-region setup.

Strengths

+Automatic global distribution — data follows your users
+Zero egress within the Tigris network
+S3-compatible API works with MVS, boto3, and standard tools
+Built on FoundationDB for strong consistency guarantees

Limitations

-Newest provider — less production track record
-Pricing still evolving as they scale
-Smaller community and fewer integrations than S3 or R2
-No equivalent to S3 lifecycle policies yet

Real-World Use Cases

•Serving ML model weights globally with automatic edge caching — no manual replication to each region
•Multi-region RAG deployments where embedding retrieval must be low-latency regardless of user location
•Distributed training pipelines that need strongly consistent access to shared datasets across regions

Choose This When

When you deploy AI inference endpoints in multiple regions and need model weights and data close to each endpoint without managing cross-region replication yourself.

Skip This If

When you need a battle-tested provider with years of production track record, or when your workload is single-region and does not benefit from global distribution.

Integration Example

import boto3

tigris = boto3.client(
    "s3",
    endpoint_url="https://fly.storage.tigris.dev",
    aws_access_key_id="YOUR_TIGRIS_ACCESS_KEY",
    aws_secret_access_key="YOUR_TIGRIS_SECRET_KEY",
)

# Upload model weights — auto-replicated to nearest edge
tigris.upload_file("ai-models", "llm/v3/weights.safetensors", "/tmp/weights.safetensors")

# Data is automatically cached at the edge closest to readers
obj = tigris.get_object(Bucket="ai-models", Key="llm/v3/weights.safetensors")
print(f"Content-Length: {obj['ContentLength'] / 1e9:.1f} GB")

$20/TB/mo storage; zero egress within network; competitive API pricing

Best for: Multi-region AI deployments that need data close to inference endpoints without manual replication

Visit Website

Wasabi

Hot cloud storage at cold storage prices. Wasabi positions itself as a drop-in S3 replacement with no egress fees and no API request fees. Straightforward pricing makes cost predictable — you pay for storage and nothing else.

What Sets It Apart

Completely flat-rate pricing with no egress or API fees — the only provider where total cost equals storage price times volume, with no variable components to track or optimize.

Strengths

+No egress fees and no API request fees
+Predictable flat-rate pricing at $6.99/TB/mo
+Full S3 API compatibility
+Good for bulk storage of embeddings and training data

Limitations

-90-day minimum storage duration — early deletion fees apply
-Higher latency than S3 or R2 in our benchmarks
-No serverless compute integration
-Limited lifecycle automation compared to S3 Intelligent Tiering

Real-World Use Cases

•Long-term archival of training datasets that rarely change — the 90-day minimum is irrelevant for static data
•Storing pre-computed embedding collections that are written once and read many times
•Backing up model checkpoints from training runs where predictable cost matters more than retrieval speed

Choose This When

When you store large volumes of static or write-once data (training datasets, embedding archives, model backups) and want zero surprises on your monthly bill.

Skip This If

When your data has high churn — frequent uploads, overwrites, or deletions within 90 days will trigger minimum retention charges that can double your effective cost.

Integration Example

import boto3

wasabi = boto3.client(
    "s3",
    endpoint_url="https://s3.wasabisys.com",
    aws_access_key_id="YOUR_WASABI_KEY",
    aws_secret_access_key="YOUR_WASABI_SECRET",
)

# Upload large training dataset — flat rate, no egress fees on reads
wasabi.upload_file("ml-datasets", "imagenet/train.tar", "/tmp/imagenet_train.tar")

# Verify upload
head = wasabi.head_object(Bucket="ml-datasets", Key="imagenet/train.tar")
print(f"Uploaded: {head['ContentLength'] / 1e9:.1f} GB")

$6.99/TB/mo flat rate; no egress fees; no API fees; 90-day minimum storage

Best for: Bulk embedding and dataset storage where predictable flat-rate pricing matters more than access latency

Visit Website

MinIO

Self-hosted, S3-compatible object storage that runs on your own hardware or VMs. The standard choice for air-gapped, on-prem, or regulated environments where data cannot leave your infrastructure. Works as an MVS backend for private-cloud vector search.

What Sets It Apart

The only production-grade, fully S3-compatible object storage you can run entirely on your own infrastructure — critical for air-gapped, regulated, or sovereignty-sensitive AI deployments.

Strengths

+Self-hosted — full control over data residency and security
+S3-compatible with excellent ecosystem support
+High throughput on dedicated hardware (100+ Gbps benchmarks)
+Open-source with active development

Limitations

-You manage everything: hardware, upgrades, monitoring, backups
-No managed offering — operational overhead is real
-Cost advantage disappears at small scale (hardware amortization)
-Requires Kubernetes or bare-metal expertise

Real-World Use Cases

•Air-gapped ML environments in defense or healthcare where training data cannot leave the network perimeter
•On-prem GPU cluster with local MinIO storing training data for zero-network-hop data loading
•Self-hosted MVS deployment for private vector search over sensitive embeddings
•Local development and CI/CD environments that need S3-compatible storage without cloud dependencies

Choose This When

When data sovereignty, air-gap requirements, or regulatory compliance mandate that data stays on your own infrastructure, and you have the ops team to manage it.

Skip This If

When you lack dedicated infrastructure or Kubernetes expertise — the operational burden of managing storage durability, backups, and upgrades yourself is significant at any scale.

Integration Example

import boto3

minio = boto3.client(
    "s3",
    endpoint_url="http://minio.internal:9000",
    aws_access_key_id="minioadmin",
    aws_secret_access_key="minioadmin",
)

# Create bucket for ML artifacts
minio.create_bucket(Bucket="ml-artifacts")

# Upload model weights to self-hosted storage
minio.upload_file("ml-artifacts", "models/resnet50.pt", "/tmp/resnet50.pt")

# Set up event notification for new data ingestion
minio.put_bucket_notification_configuration(
    Bucket="ml-artifacts",
    NotificationConfiguration={
        "QueueConfigurations": [{
            "QueueArn": "arn:minio:sqs::1:webhook",
            "Events": ["s3:ObjectCreated:*"],
        }]
    },
)

Free (open-source, AGPLv3); enterprise license with support available

Best for: On-prem or air-gapped AI deployments where data sovereignty is non-negotiable. Pairs with MVS for self-hosted vector search

Visit Website

Google Cloud Storage (GCS)

Google's object storage with strong ML ecosystem integration (Vertex AI, BigQuery). Autoclass automatically moves objects between storage tiers. A solid choice for teams building on Google Cloud, but pricier than B2 or R2 for storage-heavy AI workloads.

What Sets It Apart

Deepest integration with Google's ML ecosystem — Vertex AI, BigQuery, Dataflow, and TPU training all read from GCS natively without additional data movement or connector setup.

Strengths

+Tight Vertex AI and BigQuery integration
+Autoclass handles lifecycle tiering automatically
+Strong consistency guarantees
+S3-compatible interoperability API available

Limitations

-$20/TB/mo (Standard) — expensive for large datasets
-Egress fees ($0.12/GB) are the highest on this list
-S3 compatibility layer has gaps (no multipart presigned URLs)
-Less cost-competitive than B2, R2, or Wasabi for pure storage

Real-World Use Cases

•Vertex AI training pipelines reading datasets directly from GCS with native integration
•BigQuery ML workflows that query structured data alongside unstructured media stored in GCS
•Autoclass-managed storage for mixed-access ML artifacts — hot models tier up, cold checkpoints tier down automatically

Choose This When

When your ML infrastructure runs on Google Cloud and you want native integration with Vertex AI, BigQuery, or TPU training without managing cross-cloud data transfers.

Skip This If

When cost is a primary concern — GCS has the highest egress fees ($0.12/GB) of any provider on this list, and storage at $20/TB/mo is 3x more than B2.

Integration Example

from google.cloud import storage

client = storage.Client()
bucket = client.bucket("ml-training-data")

# Upload training dataset with Autoclass tiering
blob = bucket.blob("datasets/v3/train.parquet")
blob.upload_from_filename("/tmp/train.parquet")

# Generate signed URL for model download
url = blob.generate_signed_url(
    version="v4",
    expiration=3600,
    method="GET",
)
print(f"Download URL: {url[:80]}...")

$20/TB/mo (Standard); $0.12/GB egress; Autoclass tiering available

Best for: Teams already on Google Cloud using Vertex AI or BigQuery that want unified infrastructure

Visit Website

Hetzner Object Storage

The quiet cost leader for EU-based AI teams. Hetzner offers S3-compatible storage at $5.20/TB/mo with full versioning and object lock. EU-only regions (Germany, Finland) make it a strong fit for GDPR-sensitive ML workloads where data residency is not optional.

What Sets It Apart

The cheapest S3-compatible provider that includes versioning and object lock, with EU-only data residency that satisfies GDPR requirements by design rather than by policy.

Strengths

+Just $5.20/TB/mo — cheapest provider with full S3 features
+Versioning, object lock, and lifecycle policies included
+EU-only regions ideal for GDPR compliance
+Transparent, no-surprise pricing

Limitations

-EU-only — no North American or APAC regions
-Smaller community than AWS or Cloudflare ecosystems
-No CDN integration or edge caching built in
-Less mature tooling and documentation than hyperscalers

Real-World Use Cases

•GDPR-compliant storage for medical imaging or biometric datasets that must remain within the EU
•Budget ML training data storage for European research labs and universities
•Hosting embedding collections for EU-deployed RAG applications with strict data residency requirements

Choose This When

When you need affordable, fully-featured S3 storage with guaranteed EU data residency for GDPR-sensitive AI workloads.

Skip This If

When you need storage in North America, APAC, or any region outside Germany and Finland, or when you require a large ecosystem of adjacent cloud services.

Integration Example

import boto3

hetzner = boto3.client(
    "s3",
    endpoint_url="https://fsn1.your-objectstorage.com",
    aws_access_key_id="YOUR_HETZNER_KEY",
    aws_secret_access_key="YOUR_HETZNER_SECRET",
)

# Upload dataset with versioning enabled
hetzner.put_bucket_versioning(
    Bucket="eu-ml-data",
    VersioningConfiguration={"Status": "Enabled"},
)
hetzner.upload_file("eu-ml-data", "datasets/train.parquet", "/tmp/train.parquet")

$5.20/TB/mo; $0.01/GB egress; 1 TB internal transfer free

Best for: EU-based AI teams that need affordable, GDPR-compliant storage with full S3 feature support

Visit Website

Storj

Decentralized object storage built on a global network of independent node operators. Data is encrypted, split into erasure-coded pieces, and distributed across thousands of nodes worldwide. At $4/TB/mo with $7/TB egress, it is one of the cheapest options — and the decentralized architecture provides inherent redundancy without relying on a single cloud provider.

What Sets It Apart

Decentralized storage across thousands of independent nodes with end-to-end encryption — no single point of failure and no single entity controls your data.

Strengths

+Just $4/TB/mo storage — among the cheapest on any list
+Decentralized architecture distributes data across thousands of independent nodes
+End-to-end encryption by default — data is encrypted before leaving your machine
+S3-compatible gateway works with standard tools and SDKs

Limitations

-Higher latency than centralized providers for time-to-first-byte
-Throughput varies depending on node availability and network conditions
-Less mature S3 compatibility — some advanced features missing
-Smaller ecosystem and less enterprise adoption than mainstream providers

Real-World Use Cases

•Archiving petabytes of training data at minimal cost with inherent geographic redundancy
•Storing encrypted embedding backups where security and cost matter more than retrieval speed
•Distributing large open-source ML datasets with built-in content delivery

Choose This When

When you need the cheapest possible storage for large-scale archival of ML data and can tolerate higher latency, or when you want inherent geographic redundancy without trusting a single cloud provider.

Skip This If

When you need low-latency access for real-time inference, predictable throughput for training pipelines, or enterprise SLAs with guaranteed uptime and support.

Integration Example

import boto3

storj = boto3.client(
    "s3",
    endpoint_url="https://gateway.storjshare.io",
    aws_access_key_id="YOUR_STORJ_ACCESS_KEY",
    aws_secret_access_key="YOUR_STORJ_SECRET_KEY",
)

# Upload training data — encrypted and distributed automatically
storj.upload_file("ml-archive", "datasets/train_v5.tar.gz", "/tmp/train_v5.tar.gz")

# Data is erasure-coded across 80+ nodes for durability
head = storj.head_object(Bucket="ml-archive", Key="datasets/train_v5.tar.gz")
print(f"Stored: {head['ContentLength'] / 1e9:.1f} GB (encrypted, distributed)")

$4/TB/mo storage; $7/TB egress; 150 GB free tier

Best for: Large-scale archival of ML datasets and embeddings where cost is paramount and access latency is not critical

Visit Website

IDrive e2

Hot cloud storage from IDrive at $4/TB/mo with no minimum retention and no egress fees up to a reasonable-use threshold. A straightforward option for AI teams that need cheap S3-compatible storage without the 90-day trap that Wasabi imposes.

What Sets It Apart

Same $4/TB/mo price point as Storj and Wasabi but without Wasabi's 90-day minimum retention or Storj's latency variability — the most straightforward budget option.

Strengths

+Just $4/TB/mo with no minimum retention period
+Free egress (reasonable-use policy, similar to Wasabi but without 90-day lock)
+S3-compatible API works with standard tools
+Multiple regions including US, EU, and APAC

Limitations

-Reasonable-use egress policy limits heavy download workloads
-Less known brand with smaller community and fewer integrations
-Documentation is less thorough than major providers
-No event notifications or serverless compute integrations

Real-World Use Cases

•Storing iterating-quickly datasets that get replaced frequently — no early deletion penalties
•Budget embedding storage for research teams and startups that cannot afford $15-23/TB/mo
•Backing up ML experiment artifacts where cost predictability matters but data churn is high

Choose This When

When you need the cheapest S3 storage with no gotchas — no minimum retention, no surprising egress caps — and can tolerate a smaller provider ecosystem.

Skip This If

When you need enterprise SLAs, rich documentation, or a large ecosystem of integrations — IDrive e2 is a smaller player with less community support than B2 or R2.

Integration Example

import boto3

idrive = boto3.client(
    "s3",
    endpoint_url="https://YOUR_REGION.idrivee2.com",
    aws_access_key_id="YOUR_IDRIVE_KEY",
    aws_secret_access_key="YOUR_IDRIVE_SECRET",
)

# Upload dataset — no minimum retention, delete anytime
idrive.upload_file("experiments", "run-42/embeddings.npy", "/tmp/embeddings.npy")

# Clean up old experiment data without early-deletion penalties
idrive.delete_object(Bucket="experiments", Key="run-41/embeddings.npy")

$4/TB/mo storage; free egress (reasonable use); no minimum retention

Best for: Budget AI data storage where you want Wasabi-like pricing without the 90-day minimum retention penalty

Visit Website

Vultr Object Storage

Developer-friendly S3-compatible storage from Vultr with 17 global locations and a straightforward $5/250GB pricing model. Pairs well with Vultr GPU instances for co-located ML workloads where storage and compute sit on the same network.

What Sets It Apart

Co-located with Vultr's GPU cloud instances across 17 regions — the lowest-latency option for teams already running ML inference on Vultr hardware.

Strengths

+17 global locations for low-latency access near Vultr GPU instances
+Simple pricing starting at $5/mo for 250 GB + 1 TB transfer
+S3-compatible API works with boto3 and standard tooling
+Co-located with Vultr GPU cloud for minimal data transfer latency

Limitations

-5 GB max object size — cannot store large model weights as single objects
-Limited to Vultr's ecosystem — no Lambda-style compute triggers
-No versioning or object lock support
-Smaller storage capacity per account compared to hyperscalers

Real-World Use Cases

•Co-locating model artifacts with Vultr A100 GPU instances for zero-hop model loading
•Storing inference results and logs from Vultr-hosted ML services
•Small-to-medium dataset hosting for GPU-powered training runs on Vultr infrastructure

Choose This When

When you run ML workloads on Vultr GPU instances and want co-located storage to minimize data transfer latency and cost.

Skip This If

When you need to store objects larger than 5 GB (model weights, video datasets), require versioning or object lock, or are not already on Vultr infrastructure.

Integration Example

import boto3

vultr = boto3.client(
    "s3",
    endpoint_url="https://ewr1.vultrobjects.com",
    aws_access_key_id="YOUR_VULTR_KEY",
    aws_secret_access_key="YOUR_VULTR_SECRET",
)

# Upload model to co-located storage near GPU instance
vultr.upload_file("ml-models", "yolo/v8/weights.pt", "/tmp/yolov8.pt")

# Download to GPU instance on same network — minimal latency
vultr.download_file("ml-models", "yolo/v8/weights.pt", "/tmp/yolov8_local.pt")

$5/mo base (250 GB + 1 TB transfer); $20/TB/mo additional storage

Best for: Teams running ML inference on Vultr GPU instances who want co-located storage with minimal network latency

Visit Website

Already have embeddings?

Skip extraction — bring your own vectors to MVS. Dense + sparse + BM25 hybrid search. First 1M vectors free.

Try MVS Free Learn more about MVS

Frequently Asked Questions

Can I use object storage as a backend for a vector database?

Yes — this is exactly what MVS (Mixpeek Vector Store) does. MVS stores vectors and metadata on any S3-compatible object storage (B2, R2, S3, Tigris, Wasabi, MinIO) and serves hybrid search (dense + sparse + BM25) on top. Hot data is cached for ~8ms queries; warm data is served from object storage at ~92ms. This means your vector database costs are mostly just your storage bill, not a separate database subscription.

What is the cheapest S3-compatible storage for AI workloads?

Backblaze B2 at $6/TB/mo is the cheapest for storage-heavy workloads with moderate egress (especially with free CDN egress). Cloudflare R2 at $15/TB/mo is cheapest for read-heavy workloads because egress is free. Wasabi at $6.99/TB/mo is cheapest for predictable flat-rate pricing with no API fees. The right answer depends on your access pattern: if you read a lot (retrieval, RAG), R2 wins on total cost; if you store a lot and read less, B2 wins.

Is MinIO a good alternative to AWS S3?

MinIO is an excellent S3 alternative for self-hosted and on-prem deployments. It offers full S3 API compatibility, high throughput on dedicated hardware, and complete data sovereignty. The trade-off is operational overhead: you manage hardware, upgrades, and backups yourself. For cloud workloads, Backblaze B2 or Cloudflare R2 offer the same S3 compatibility with zero ops at lower cost than S3.

What is the best object storage for storing embeddings and model weights?

For embeddings: use any S3-compatible storage paired with MVS, which adds vector search on top. Backblaze B2 is the most cost-effective for large embedding collections. For model weights: Cloudflare R2 is ideal because zero egress means you can pull weights to any region without transfer fees. For training data: Wasabi's flat-rate pricing with no API fees keeps costs predictable during data-intensive training runs.

How does object storage compare to block storage for AI workloads?

Object storage (S3, B2, R2) is 5-10x cheaper per TB than block storage (EBS, Persistent Disks) and scales to exabytes without provisioning. The trade-off is higher latency: ~50-100ms for object storage vs ~1-5ms for block storage. For AI workloads, object storage is the right choice for embeddings, datasets, and model artifacts. Use block storage only for the hot serving layer (like Qdrant or the warm cache in MVS) where sub-10ms latency is critical.

Ready to Get Started with Mixpeek?

See why teams choose Mixpeek for multimodal AI. Book a demo to explore how our platform can transform your data workflows.

Book a Demo Contact Sales

Explore Other Curated Lists

multimodal ai

Best Multimodal AI APIs

A hands-on comparison of the top multimodal AI APIs for processing text, images, video, and audio through a single integration. We evaluated latency, modality coverage, retrieval quality, and developer experience.

11 tools rankedView List

search retrieval

Best Video Search Tools

We tested the leading video search and understanding platforms on real-world content libraries. This guide covers visual search, scene detection, transcript-based retrieval, and action recognition.

9 tools rankedView List

content processing

Best AI Content Moderation Tools

We evaluated content moderation platforms across image, video, text, and audio moderation. This guide covers accuracy, latency, customization, and compliance features for trust and safety teams.

9 tools rankedView List

Best S3-Compatible Object Storage for AI Workloads in 2026

Quick Answer

Backblaze B2

Cloudflare R2

AWS S3

How We Evaluated

Cost per TB

S3 Compatibility

Performance

AI Ecosystem Fit

Operational Simplicity

Overview

Jump to

Backblaze B2

Strengths

Limitations

Real-World Use Cases

Choose This When

Skip This If

Integration Example

Cloudflare R2

Strengths

Limitations

Real-World Use Cases

Choose This When

Skip This If

Integration Example

AWS S3

Strengths

Limitations

Real-World Use Cases

Choose This When

Skip This If

Integration Example

Tigris

Strengths

Limitations

Real-World Use Cases

Choose This When

Skip This If

Integration Example

Wasabi

Strengths

Limitations

Real-World Use Cases

Choose This When

Skip This If

Integration Example

MinIO

Strengths

Limitations

Real-World Use Cases

Choose This When

Skip This If

Integration Example

Google Cloud Storage (GCS)

Strengths

Limitations

Real-World Use Cases

Choose This When

Skip This If

Integration Example

Hetzner Object Storage

Strengths

Limitations

Real-World Use Cases

Choose This When

Skip This If

Integration Example

Storj

Strengths

Limitations

Real-World Use Cases

Choose This When

Skip This If

Integration Example

IDrive e2

Strengths

Limitations

Real-World Use Cases