Introduction

Your AI agent can read text. It cannot watch a video, scan a photo for faces, or search audio by what was said. Mixpeek is the infrastructure layer that gives agents access to video, images, audio, and documents through a single API.

Start building on Mixpeek

Create a workspace and run your first multimodal search — index your own video, images, audio, or documents in minutes.

How It Works

Index

Upload video, images, audio, and documents to Buckets. Mixpeek runs feature extraction automatically — faces, objects, transcripts, embeddings, and structured metadata all get indexed into searchable Collections.

See this running on real content — 7 PDFs that fan out into 59 searchable knowledge-graph nodes — in the sample data.

Build retrieval pipelines that your agent calls. Semantic search, face search, object search, transcript search — chain them together into multi-stage Retrievers and expose them as a single endpoint.

Multi-stage retrieval pipeline — Chaining search stages into a retrieval pipeline

Run this exact pipeline against the sample data right now — no account or API key needed.

Integrate

Wire Mixpeek into your agent as a LangChain tool, an MCP server, or a direct REST call. Your agent sends a query, gets structured results back, and acts on them.

Already have embeddings? Skip extraction entirely — bring your own vectors and search instantly with the Mixpeek Vector Store. 1M vectors free, 60 seconds to first query.

Quickstart

Get Started

Index multimodal content and search it in under 10 minutes — LangChain, MCP, or REST

What Can You Build?

Pick an outcome and follow the guide — each one is end-to-end and copy-pasteable.

Search video

Find moments by what’s shown or said — visual + speech embeddings

Reverse image & video search

Use an image or clip as the query to find visually similar media

Hybrid search

Combine dense vectors with keyword/BM25 and attribute filters

Rerank for precision

Push the most relevant results to the top with a cross-encoder

Cluster content

Group, label, and visualize a collection automatically

Build a taxonomy

Classify content against your own hierarchy with a multimodal join

Secure per-user retrieval

Enforce access control with built-in ACLs or external OpenFGA

Bring your own vectors

Use Mixpeek as a standalone vector store — upsert and query instantly

Auto-tune relevance

Improve ranking automatically from click and reward signals

Migrate embedding models

Re-extract, validate, and cut over to a new model with no downtime

What Gets Extracted

File Type	Extracted Features
Video	Face embeddings (ArcFace 512D), scene descriptions (Gemini), visual embeddings (Vertex AI 1408D), transcripts (Whisper), transcript embeddings (E5-Large 1024D), keyframes
Images	Visual embeddings (SigLIP 768D or Vertex AI 1408D), face embeddings (ArcFace 512D), OCR text, descriptions, structured extraction
Audio	Transcripts (Whisper), transcript embeddings (E5-Large 1024D), multimodal audio embeddings (Vertex AI 1408D)
Documents	Text chunks, text embeddings (E5-Large 1024D), OCR for scanned PDFs, structured extraction

Each extracted feature becomes an independently searchable document. A single video can produce hundreds of documents — one per face, one per transcript segment, one per scene.

Key Concepts

Namespaces isolate data between tenants, environments, or projects. Every API request includes a namespace header.
Buckets hold your raw files. Upload once, process many ways.
Collections define what gets extracted. Each collection runs a feature extractor (CLIP, Whisper, LayoutLM, etc.) against objects in a bucket.
Retrievers are search pipelines you configure in JSON. Chain stages together — vector search, face matching, filters, re-ranking — and expose the result as one endpoint your agent calls.

Next Steps

Core Concepts

Understand namespaces, buckets, collections, and retrievers in depth

Data Model

See how namespaces, buckets, collections, objects, and features fit together

Feature Extractors

Learn what each extractor does and how to configure it

Tutorials

Step-by-step guides for common use cases

Get started

Connect your data

Extract features

Build retrievers

Enrich & organize

Integrate & operate

Resources

Start building on Mixpeek

How It Works

Quickstart

Get Started

What Can You Build?

Search video

Reverse image & video search

Hybrid search

Rerank for precision

Cluster content

Build a taxonomy

Secure per-user retrieval

Bring your own vectors

Auto-tune relevance

Migrate embedding models

What Gets Extracted

Key Concepts

Next Steps

Core Concepts

Data Model

Feature Extractors

Tutorials

Start building on Mixpeek

​How It Works

​Quickstart

Get Started

​What Can You Build?

Search video

Reverse image & video search

Hybrid search

Rerank for precision

Cluster content

Build a taxonomy

Secure per-user retrieval

Bring your own vectors

Auto-tune relevance

Migrate embedding models

​What Gets Extracted

​Key Concepts

​Next Steps

Core Concepts

Data Model

Feature Extractors

Tutorials

How It Works

Quickstart

What Can You Build?

What Gets Extracted

Key Concepts

Next Steps