Find any scene in your video library.
Mixpeek makes your video, image, and audio archives searchable. Point it at the storage you already have; get back relevant, timestamped results with one API that learns as your team and your agents use it.
Bring vectors
MVSAgent-native vector store on object storage. Dense, sparse, and BM25 search. From $25/mo.
Connect files
ManagedManaged indexing extracts scenes, faces, OCR, transcripts, and embeddings from any file type.
What the agent sees.
Every object you index becomes structured, searchable features: faces and objects in a frame, layout regions in a document, speakers in audio. Those are the same features an agent queries, and joins across modalities.
Hover or tap a card to preview the search it powers.
Person · 0.95Face · 0.98Handbag · 0.91Video · 00:04:12“A woman in a white shirt walks past a red storefront.”
transcript · “…meet me at the corner in five.”
HeaderChartBodySignaturePDF · resume.pdfHeader, body, charts, and signature detected as typed regions.
OCR · 1 header · 3 sections · 1 signature

Who spoke when, aligned to the transcript and the timeline.
audio · 2 speakers · matched at 00:01:30
One query across every modality.
Real questions rarely fit one feature. “Find the moment our CEO said guidance while the slide read Q4 outlook” needs a face, a spoken phrase, and on-screen text to line up at the same instant.
Mixpeek ties those features to the same object and timestamp, so an agent gets back the exact clip instead of three unrelated matches.
The CEO says “guidance” as the slide behind her reads “Q4 outlook.”
In production right now.
Visual search across 45k artworks
Upload any image and find visually similar paintings across 45,000+ artworks, or just describe what you're looking for. Hybrid image and text retrieval, ranked with RRF.
Try gallery search →Posters that learn your taste
Like or dislike movie posters and watch the grid adapt to your taste in real time. Interaction signals feed learned fusion so recommendations improve from usage.
Try movie personalization →Face search across video
Drop in a headshot and find every clip a person appears in across 63 video ads and 2,600+ faces. Full trace for takedown evidence.
Try face search →One install. Two paths.
Most retrieval stacks mean gluing together a vector DB, a file pipeline, and an agent layer. Mixpeek is one install with two ways in.
Bring embeddings
Plugs into your existing stack.
Connect your storage, point Mixpeek at it, and every file becomes searchable by what's inside it. No migration, no code changes.

Mux
Every Mux upload becomes searchable by face, scene, transcript, and on-screen text, with no manual tagging.
View integration →
Backblaze B2
S3-compatible extraction at 1/5th the cost. Store on B2, extract with Mixpeek, zero egress fees.
View integration →Iconik
Every asset in your DAM becomes findable by what's inside it: scenes, faces, spoken words, on-screen text.
View integration →Turn any file into searchable features.
Connect a bucket and these pipelines run as they are. Every one is documented and open source in the extractor cookbook.
Multimodal (Video/Audio/Image)
Video · Image · Audio · Text
Unified embeddings for video, audio, image, and text. Scene and silence chunking, Whisper transcription, thumbnails.
Universal All-in-One
Any file
One extractor for image, video, audio, and documents. Auto-detects modality and applies the right pipeline.
Image Embeddings (SigLIP)
Image · PDF
Dense 768-D image embeddings with Google SigLIP for text-to-image search in one contrastive space.
Text Embeddings (E5-Large)
Text
Multilingual dense text embeddings with E5-Large for semantic search and RAG out of the box.
Multi-File Object Embeddings (Gemini)
Any file
Embed ALL files of an object (images, PDFs, video, audio, text) into one 3072-D Gemini vector.
Face Identity (SCRFD + ArcFace)
Image · Video · PDF
Production face recognition that detects, aligns, and embeds faces to 512-D ArcFace vectors.
What we shipped lately
- Jul 5APIAttribute clustering now produces real groups end-to-endClustering by a metadata attribute could finish with zero groups, return 400s when reading executions back, and drop document metadata from exports. All of those paths are fixed: attribute runs now attach embeddings correctly, preserve metadata through export, and read back cleanly. Verified on a 9,000-document run with every document grouped and search-within working, including hierarchical and multi-feature configurations.
- Jul 5EngineCluster labels now describe what is actually in each groupAI-generated cluster labels are dramatically better. Attribute clusters now get semantic labels, summaries, and keywords instead of generic Cluster N names, and a sampling fix means labels are grounded in each group's real content rather than coming back near-identical or fabricated. A hallucination guard and representative sample selection keep labels distinct across groups.
- Jul 5EngineBatches no longer get stuck in a processing stateClosed a gap where a batch could finish all of its work but keep reporting a non-terminal status. The repair sweep now also finalizes jobs whose underlying compute records disappeared, so long-stuck batches heal automatically. One batch that had been stuck for 32 days was repaired by the new sweep on its first pass.
- Jul 5StudioStarter retriever templates fixed, plus Explain Plan polishEvery retriever starter template now creates a working retriever. Two templates carried outdated stage configurations that made their retrievers uncreatable, and template configs are now validated against the live stage registry in CI so they stay correct. Explain Plan output now lists stages in execution order with the first stage on top, and it is opt-in per run so default executions stay fast. Retriever evaluations also no longer fail intermittently, and a false feature index warning is gone.
From $25/mo. Usage-based everything.
Two products, one model: a monthly minimum that acts as a floor, with usage above it billed at the same transparent rates. MVS is priced by the vector, Managed by the object.
Bring your own embeddings and pay by the vector. Dense, sparse, and BM25 search on your own object storage. Build starts at $25/mo with up to 1M vectors; Scale ($250/mo) covers 25M.
Start with MVSBring raw objects and pay by the object — credits at $0.001 cover extraction, embedding, indexing, enrichment, and retrieval. Build covers 100K objects/mo; Scale ($250/mo) covers 1M.
Start with ManagedDedicated infrastructure, self-hosted options, SSO, SLA, security reviews, and hands-on architecture support.
Talk to usCommon questions.
Do I have to move my data?
No. Mixpeek reads from your existing S3, GCS, R2, Azure, or S3-compatible bucket. Your storage stays the system of record, and nothing leaves your cloud.
How fast is retrieval?
Hybrid queries (dense, sparse, and BM25) return in well under 100ms p95, even with vectors persisted on object storage rather than held in RAM.
Do I need embeddings to start?
No. Bring your own vectors with MVS, or point Managed at raw files and it generates embeddings and features for you.
What can Managed extract?
Faces, scenes, transcripts, OCR, labels, and embeddings from video, images, audio, PDFs, and documents, all indexed at the object level.
Can I self-host?
Yes. Deploy in your own cloud (BYO-Cloud) with SOC 2 and HIPAA-ready controls, SSO, audit trails, and namespaces.
How does pricing work?
Both MVS and Managed start at $25/mo minimum. Usage counts toward the minimum — pay the greater of metered usage or the floor. MVS bills storage + queries; Managed bills in credits covering extraction, embedding, indexing, and retriever execution.



