Your Embedding Model Isn't the Problem. Your Chunking Is.
Summary
87% vs 13% accuracy. Same model. Different chunking. A NAACL 2025 study tested 25 chunking configs across 48 embedding models — and chunking mattered as much as the model itself.
About this video
87% vs 13% accuracy. Same model. Different chunking. A NAACL 2025 study tested 25 chunking configs across 48 embedding models — and chunking mattered as much as the model itself. Here's what's actually breaking your AI search: 1. Fixed 500-token splits that cut sentences mid-thought 2. Treating a PDF like a plain text file 3. Comparing OpenAI vs Cohere vs Jina — instead of fixing the input Here's the fix — modality-aware decomposition: 1. Video → scenes, faces, transcripts 2. PDF → tables, entities, layouts 3. Image → objects, text, colors The embedding model gets clean context. Not sentence fragments. Not mid-paragraph cuts. Chunking matters more than the model. Fix the input. Full guide → mixpeek.com/chunking-strategies #RAG #ChunkingStrategies #AISearch #VectorSearch #LLM #MachineLearning #AI #Mixpeek