> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Introduction

> Mixpeek gives AI agents the ability to see, hear, and understand multimodal content

Your AI agent can read text. It cannot watch a video, scan a photo for faces, or search audio by what was said. Mixpeek is the infrastructure layer that gives agents access to video, images, audio, and documents through a single API.

## How It Works

<Steps>
  <Step title="Index">
    Upload video, images, audio, and documents to [Buckets](/platform/data-model#create-a-bucket). Mixpeek runs feature extraction automatically — faces, objects, transcripts, embeddings, and structured metadata all get indexed into searchable [Collections](/overview/concepts).

    <Frame caption="Breaking down a video into semantic layers">
      <img src="https://mintcdn.com/mixpeek/TwtTrae3Fi3EFJ72/assets/mixpeek-decomposition.svg?fit=max&auto=format&n=TwtTrae3Fi3EFJ72&q=85&s=4bfa40801fe7d78a68db0bd3f48b49c4" alt="Indexing multimodal content into semantic layers" width="1200" height="550" data-path="assets/mixpeek-decomposition.svg" />
    </Frame>
  </Step>

  <Step title="Search">
    Build retrieval pipelines that your agent calls. Semantic search, face search, object search, transcript search — chain them together into multi-stage [Retrievers](/retrieval/retrievers) and expose them as a single endpoint.

    <Frame caption="Chaining search stages into a retrieval pipeline">
      <img src="https://mintcdn.com/mixpeek/TwtTrae3Fi3EFJ72/assets/mixpeek-retrieval.svg?fit=max&auto=format&n=TwtTrae3Fi3EFJ72&q=85&s=4089a289e16b0cb17b6decd075b46f13" alt="Multi-stage retrieval pipeline" width="1200" height="600" data-path="assets/mixpeek-retrieval.svg" />
    </Frame>
  </Step>

  <Step title="Integrate">
    Wire Mixpeek into your agent as a [LangChain tool](/integrations/agents#langchain), an [MCP server](/integrations/agents#mcp-model-context-protocol), or a direct REST call. Your agent sends a query, gets structured results back, and acts on them.
  </Step>
</Steps>

<Tip>
  **Already have embeddings?** Skip extraction entirely — bring your own vectors and search instantly with the [Mixpeek Vector Store](/vector-store/overview). 1M vectors free, 60 seconds to first query.
</Tip>

## Quickstart

<Card title="Get Started" icon="rocket" href="/overview/quickstart">
  Index multimodal content and search it in under 10 minutes — LangChain, MCP, or REST
</Card>

## What Can You Build?

Pick an outcome and follow the guide — each one is end-to-end and copy-pasteable.

<CardGroup cols={2}>
  <Card title="Search video" icon="film" href="/tutorials/video-understanding">
    Find moments by what's shown or said — visual + speech embeddings
  </Card>

  <Card title="Reverse image & video search" icon="image" href="/tutorials/reverse-search">
    Use an image or clip as the query to find visually similar media
  </Card>

  <Card title="Hybrid search" icon="layer-group" href="/retrieval/stages/feature-search#lexical-bm25-search">
    Combine dense vectors with keyword/BM25 and attribute filters
  </Card>

  <Card title="Rerank for precision" icon="arrow-up-wide-short" href="/retrieval/stages/rerank">
    Push the most relevant results to the top with a cross-encoder
  </Card>

  <Card title="Cluster content" icon="diagram-project" href="/enrichment/clusters">
    Group, label, and visualize a collection automatically
  </Card>

  <Card title="Build a taxonomy" icon="sitemap" href="/enrichment/taxonomies">
    Classify content against your own hierarchy with a multimodal join
  </Card>

  <Card title="Secure per-user retrieval" icon="lock" href="/platform/permissions">
    Enforce access control with built-in ACLs or external OpenFGA
  </Card>

  <Card title="Bring your own vectors" icon="database" href="/vector-store/overview">
    Use Mixpeek as a standalone vector store — upsert and query instantly
  </Card>

  <Card title="Auto-tune relevance" icon="wand-magic-sparkles" href="/retrieval/auto-tune">
    Improve ranking automatically from click and reward signals
  </Card>

  <Card title="Migrate embedding models" icon="arrows-rotate" href="/processing/model-migration">
    Re-extract, validate, and cut over to a new model with no downtime
  </Card>
</CardGroup>

## What Gets Extracted

| File Type                                        | Extracted Features                                                                                                                                                         |
| ------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **[Video](/processing/extractors/multimodal)**   | Face embeddings (ArcFace 512D), scene descriptions (Gemini), visual embeddings (Vertex AI 1408D), transcripts (Whisper), transcript embeddings (E5-Large 1024D), keyframes |
| **[Images](/processing/extractors/image)**       | Visual embeddings (SigLIP 768D or Vertex AI 1408D), face embeddings (ArcFace 512D), OCR text, descriptions, structured extraction                                          |
| **[Audio](/processing/extractors/multimodal)**   | Transcripts (Whisper), transcript embeddings (E5-Large 1024D), multimodal audio embeddings (Vertex AI 1408D)                                                               |
| **[Documents](/processing/extractors/document)** | Text chunks, text embeddings (E5-Large 1024D), OCR for scanned PDFs, structured extraction                                                                                 |

Each extracted feature becomes an independently searchable document. A single video can produce hundreds of documents — one per face, one per transcript segment, one per scene.

## Key Concepts

* **Namespaces** isolate data between tenants, environments, or projects. Every API request includes a namespace header.
* **Buckets** hold your raw files. Upload once, process many ways.
* **Collections** define what gets extracted. Each collection runs a feature extractor (CLIP, Whisper, LayoutLM, etc.) against objects in a bucket.
* **Retrievers** are search pipelines you configure in JSON. Chain stages together — vector search, face matching, filters, re-ranking — and expose the result as one endpoint your agent calls.

## Next Steps

<CardGroup cols={2}>
  <Card title="Core Concepts" icon="book" href="/overview/concepts">
    Understand namespaces, buckets, collections, and retrievers in depth
  </Card>

  <Card title="Data Model" icon="sitemap" href="/platform/data-model">
    See how namespaces, buckets, collections, objects, and features fit together
  </Card>

  <Card title="Feature Extractors" icon="microchip" href="/processing/feature-extractors">
    Learn what each extractor does and how to configure it
  </Card>

  <Card title="Tutorials" icon="graduation-cap" href="/tutorials">
    Step-by-step guides for common use cases
  </Card>
</CardGroup>
