NEWWhy single embeddings fail for video.Read the post →
    storage
    Backblaze B2 logo

    Backblaze B2

    S3-compatible AI pipelines at 1/5th the storage cost

    Connect Backblaze B2 buckets to Mixpeek for automatic multimodal extraction at a fraction of AWS S3 prices. Store your videos, images, and documents in B2, run feature extractors and embeddings through Mixpeek, and write indexed results back to B2 — with zero egress fees through Bandwidth Alliance partners.

    Backblaze B2 integration walkthrough

    The Problem

    Teams building multimodal AI pipelines hit a cost wall fast. AWS S3 charges $23/TB/month for storage and $0.09/GB for egress — costs that compound quickly when you're storing terabytes of video, images, and documents, then moving them to processing infrastructure. A 50TB media library costs $1,150/month just to store, and every extraction run that pulls data out of S3 adds egress fees on top. Teams end up choosing between processing everything they need and staying within budget.

    The Solution

    Mixpeek connects directly to Backblaze B2 via the S3-compatible API — same SDKs, same tools, no code changes. B2 stores your data at $6/TB/month (75% less than S3) with free egress through Bandwidth Alliance partners like Cloudflare. Mixpeek reads objects from your B2 buckets, runs multimodal extractors — visual embeddings, object detection, face recognition, OCR, and transcription — then indexes everything into retrievers. Processed results and vector indexes are written back to B2 through Mixpeek Vector Store, keeping your entire pipeline on low-cost infrastructure end-to-end.

    Measurable Impact

    What teams see after connecting Mux to Mixpeek

    75% lower storage costs

    $6/TB/month on B2 vs $23/TB on AWS S3, saving $850/month on a 50TB library

    Zero egress fees

    free data transfer through Bandwidth Alliance partners (Cloudflare, Fastly, Bunny CDN)

    No code changes

    S3-compatible API means existing SDKs and tools work out of the box with B2

    Same-hour setup

    connect a B2 bucket, configure extractors, and start processing in under 60 minutes

    End-to-end B2 pipeline

    source objects, extracted features, and vector indexes all stored on Backblaze

    Parallel extraction at scale

    Ray GPU clusters process thousands of assets concurrently across your entire library

    Pipeline Architecture

    Hover over each step to see how the components connect

    1

    B2 Bucket Connection

    S3-Compatible API

    Connect your Backblaze B2 bucket to Mixpeek using the S3-compatible API. Same endpoint format, same SDKs — just point to your B2 region (e.g., s3.us-west-004.backblazeb2.com).

    2

    Object Discovery

    Include Patterns

    Mixpeek scans your B2 bucket and applies include patterns to select which objects to process. Filter by file extension, path prefix, or naming convention.

    3

    Multimodal Extraction

    Extractors

    Selected objects are processed through parallel extractors: visual embeddings, object detection, face identity, OCR, speech transcription, and scene splitting — running on Ray GPU clusters.

    4

    Feature Indexing

    Collections

    Extracted features are stored in Mixpeek collections with full lineage back to the source B2 object, including bucket, key, and extraction metadata.

    5

    Search Retriever

    Feature Search + Filters

    A retriever combines vector similarity, face identity matching, metadata filters, and full-text search. Query across all extracted features from a single API call.

    6

    Results to B2

    Mixpeek Vector Store

    Processed results, vector indexes, and scan reports are written back to Backblaze B2 via Mixpeek Vector Store. Your data stays on B2 end-to-end — zero egress fees.

    Backblaze B2 Integration Deep Dive

    Point a Mixpeek connector at your B2 bucket endpoint using the S3-compatible API. Mixpeek treats B2 buckets identically to AWS S3 — no adapter code, no migration. Set up collections with the extractors you need, configure include patterns to control which objects get processed, and Mixpeek handles the rest. New objects added to B2 are detected and processed automatically. The pipeline decomposes each asset into extracted features — scene compositions, detected objects, recognized faces, on-screen text, and transcribed speech — then indexes everything into a retriever with feature search and metadata filtering. Batch processing runs across your entire library in parallel on Ray GPU clusters, and results are written back to B2 via Mixpeek Vector Store with full lineage tracking.

    object-storage
    s3-compatible
    cost-optimization
    video
    images
    documents

    Ready to integrate?

    Get started with Mixpeek + Backblaze B2 in minutes. Read the docs, create a free account, or schedule a walkthrough with our team.