Name: Your Embedding Model Isn't the Problem. Your Chunking Is.
Uploaded: 2026-04-09T22:33:04Z
Duration: 60 s

Your Embedding Model Isn't the Problem. Your Chunking Is.

0:60

Short Form

Ethan

April 9, 2026

Summary

87% vs 13% accuracy. Same model. Different chunking. A NAACL 2025 study tested 25 chunking configs across 48 embedding models — and chunking mattered as much as the model itself.

short-form

About this video

87% vs 13% accuracy. Same model. Different chunking. A NAACL 2025 study tested 25 chunking configs across 48 embedding models — and chunking mattered as much as the model itself. Here's what's actually breaking your AI search: 1. Fixed 500-token splits that cut sentences mid-thought 2. Treating a PDF like a plain text file 3. Comparing OpenAI vs Cohere vs Jina — instead of fixing the input Here's the fix — modality-aware decomposition: 1. Video → scenes, faces, transcripts 2. PDF → tables, entities, layouts 3. Image → objects, text, colors The embedding model gets clean context. Not sentence fragments. Not mid-paragraph cuts. Chunking matters more than the model. Fix the input. Full guide → mixpeek.com/chunking-strategies #RAG #ChunkingStrategies #AISearch #VectorSearch #LLM #MachineLearning #AI #Mixpeek