Former lead of MongoDB's Search Team, Ethan noticed the most common problem customers faced was building indexing and search infrastructure on their S3 buckets. Mixpeek was born.
Vector spaces aren't portable across models. Your index is stateful — that's why model upgrades are painful. Here's the versioning pattern that makes it a non-event.
Vector search has no row-level security, so apps filter results after retrieval and leak data. Here is authorized multimodal search done right: OpenFGA enforced server-side, fail-closed.
Every vector database forces you to declare dimensions and distance metrics before writing a single vector. Schema-on-write, compute pushdown, and learned indexes fix the three things they got wrong.
A 3072-dimensional embedding encodes everything about a video and distinguishes nothing. Decomposing content into named, measurable features, then placing them in a queryable hierarchy, is how multimodal search actually works at scale.
Text-only RAG pipelines miss 80%% of what is in your content. A video contains faces, dialogue, on-screen text, background music, and brand logos. No single embedding captures all of that. The solution is multi-stage retrieval.
Traditional taxonomies classify one content type at a time. Multimodal taxonomies unify classification across every format using embedding similarity the missing layer between raw AI features and structured, searchable metadata.
We compared 21 S3-compatible object storage providers across pricing, egress, features, and fine print. AWS S3 costs 15x more than the cheapest alternative for the same workload. Here's everything we found.
How we built an autonomous Kalshi trading bot using the Kalshi API and Mixpeek's video transcription, semantic search, and LLM data extraction no external tools required.
We are drowning in unstructured data — video, audio, images, documents, IoT — but our infrastructure still assumes everything is a row or a vector. The multimodal data warehouse is the missing layer: object decomposition, tiered storage, and multi-stage retrieval pipelines for the AI era.
We benchmarked every viable approach to multimodal document retrieval on financial tables (ViDoRe/TabFQuAD) and found a combination that hasn't been published before: ColQwen2 + MUVERA. It retains 99.4% of brute-force quality at a fraction of the cost, and obliterates OCR-based search. The Problem Late interaction models like ColBERT and ColPali represent documents as sets of vectors—one per token or image patch. At query time, every query token finds its best-matching document token (MaxSim/
Every major IP enforcement tool finds violations after they're live. We built one that catches them before publication. Here's the architecture, the models, and what we learned.
We tested Gemini, Twelve Labs Marengo, X-CLIP, SigLIP 2, and InternVideo2 on text-to-video retrieval with graded relevance. The results surprised us.
Google's Gemini Embedding 2 embeds images, PDFs, and text together in a single API call. Here's how we integrated it into Mixpeek's feature extractor pipeline, the production numbers, and where multi-file embedding beats single-chunk approaches.
How we built query preprocessing into Mixpeek's feature_search stage — decompose a 500MB video into chunks, embed in parallel, fuse results. Zero API surface change for callers.
Sports broadcasters cut 4-8 hour editing sessions to 15 minutes using AI video analysis. Learn how to build automated highlight detection, archive search, and performance analytics pipelines for any sport.