Video Learning Hub
Master multimodal AI concepts through comprehensive tutorials, guides, and best practices from our expert team.
Trusted by engineers at

Building an Exploratory Multimodal Retriever with the National Gallery
Discover how to build a powerful exploratory image board using multimodal search across 120,000 images from the National Gallery. This walkthrough demonstrates combining text search, reverse image search, and document-based queries into a unified retrieval experience using hybrid search with Reciprocal Rank Fusion (RRF). 👉 Live Demo: https://mxp.co/r/npg What you'll learn: ⚡ Building exploratory search interfaces for visual content ⚡ Combining text, image, and document reference queries ⚡ Implementing hybrid search with RRF for optimal results ⚡ Using Google SigLIP embeddings for image understanding ⚡ Creating multi-stage retriever pipelines with feature search ⚡ Capturing user signals for recommendation systems ⚡ Architecture patterns: Objects → Buckets → Collections → Retrievers Real-world demo: Visual curation across 120k images, 12GB of data, with text + image + document hybrid queries. Full source code available in the Mixpeek showcase repository.

Web Scraper Guide
Learn how to use Mixpeek's Web Scraper to recursively crawl websites and extract multimodal content with automatic embeddings. This guide demonstrates crawling documentation sites, extracting code snippets and images, and making everything searchable with semantic embeddings. What you'll learn: ⚡ Recursive website crawling with depth control ⚡ Extracting text, code blocks, and images ⚡ Multimodal embeddings (E5-Large, Jina Code, SigLIP) ⚡ JavaScript rendering for SPAs ⚡ URL filtering and structured extraction ⚡ Building searchable knowledge bases from docs

Buckets Guide
Learn how to use Mixpeek Buckets for schema-backed data ingestion with automatic validation and lineage tracking. This guide demonstrates creating buckets, defining schemas, uploading objects with multimodal blobs, and processing them through collections. What you'll learn: ⚡ Creating buckets with JSON schema validation ⚡ Uploading objects with multimodal blobs (text, image, video, JSON) ⚡ Schema enforcement and blob type validation ⚡ Lineage tracking from source to documents ⚡ Integration with collections for feature extraction ⚡ Best practices for organizing multimodal data

Video Understanding: From Frames to Contextual Search
Master video understanding and how it differs from basic image understanding. This video covers frame extraction techniques (sampling, keyframe detection, scene-based), video embedding models that capture temporal context, and building sophisticated semantic video search applications. What you'll learn: ⚡ Video vs image understanding: temporal context matters ⚡ Frame extraction techniques: sampling, keyframe, scene-based ⚡ Frame-level vs video-level embeddings ⚡ How video embeddings capture motion and actions ⚡ Scene detection with AutoShot and semantic deduplication ⚡ Vertex AI multimodal embeddings for video ⚡ Building scene-based video search pipelines ⚡ Real demo: Contextual video retrieval in Mixpeek Studio

Image Understanding: Vision Encoders & Multimodal Search
Master how computers see and search images. This video covers vision encoding models like CLIP and SigLIP, how images are converted into patches and embeddings, object detection with YOLO, and building multimodal search systems that support text-to-image, image-to-text, and image-to-image queries. What you'll learn: ⚡ How vision transformers convert images into embeddings ⚡ Image patches and mean pooling explained ⚡ CLIP vs SigLIP embedding models ⚡ Object detection and classification with YOLO ⚡ Cross-modal search: text queries on image datasets ⚡ Combining text + image queries with mean pooling ⚡ Feature URIs for image extractors ⚡ Live demo: National Gallery multimodal retriever

Feature URIs: Evolving Embeddings Without Migration
Learn how to evolve embedding models without painful re-indexing. Master Feature URIs—a core abstraction for managing the lifecycle of embeddings, extractors, and indexes. Discover why vector indexes are stateful, how to A/B test embedding models safely, and how to roll forward and roll back upgrades without downtime. What you'll learn: ⚡ Why vector indexes are inherently stateful and fragile ⚡ The 4 components of a Feature URI ⚡ How extractors, embedding models, versions, and inference endpoints are coupled ⚡ A/B testing embedding models without re-indexing ⚡ Rolling forward and rolling back embedding upgrades ⚡ Real examples using image collections and feature search ⚡ How Feature URIs enable hybrid search, re-ranking, and evaluation

Hybrid Search: Best of Both Worlds
Combine keyword (BM25) and semantic (vector) search for maximum retrieval effectiveness. Learn when keyword search fails with synonyms and paraphrasing, when semantic search fails with specific IDs and acronyms, and how to use score fusion strategies to get the best of both worlds. What you'll learn: ⚡ Trade-offs between keyword and semantic search ⚡ Precision vs recall in retrieval systems ⚡ Score fusion strategies (RRF, weighted, distribution-based) ⚡ The 80/20 rule for catching edge cases ⚡ Building hybrid retrievers with feature_filter and attribute_filter ⚡ Real-world example: Developer documentation search

Chunking Strategies: Breaking Documents into Searchable Pieces
Master the art of breaking large documents into searchable chunks. Learn why chunking is necessary for context windows and precision, explore fixed-size, semantic, and sentence-based strategies, and understand chunk overlap techniques that prevent information loss at boundaries. What you'll learn: ⚡ Why chunking matters for context windows and precision ⚡ Chunking strategies: fixed-size, semantic, sentence-based, layout-based ⚡ Chunk overlap as a safety net (67% → 94% accuracy improvement) ⚡ Multimodal chunking: videos, audio, images, PDFs ⚡ Building object decomposition pipelines in Mixpeek ⚡ Real-world example: 200-page legal contract analysis

Semantic Search Fundamentals
Move beyond keyword matching to meaning-based retrieval. This video covers the evolution from keyword search to semantic search, vector similarity algorithms (cosine similarity, dot product), encoding models, and building feature search pipelines in Mixpeek. What you'll learn: ⚡ Keyword search vs semantic search ⚡ Cosine similarity and dot product explained ⚡ How encoder models capture meaning ⚡ HNSW indexes for vector search ⚡ Building retrieval pipelines with feature search ⚡ Intent-based retrieval in practice
