A technique that enhances language model generation by first retrieving relevant documents from a knowledge base and including them in the model's context. RAG reduces hallucination and enables up-to-date, domain-specific responses in multimodal AI applications.
RAG combines a retrieval system with a generative language model in a two-step process. First, the user's query is used to retrieve relevant documents from a knowledge base using semantic search or keyword matching. Then, the retrieved documents are provided as context to the language model, which generates a response grounded in the retrieved information. This grounds the model's output in actual data rather than relying solely on training knowledge.
The retrieval component uses dense retrieval (embedding similarity), sparse retrieval (BM25), or hybrid approaches. Retrieved documents are inserted into the LLM context window alongside the query. Chunking strategies (fixed-size, semantic, recursive) determine how documents are split for indexing. Reranking improves the relevance of retrieved contexts. Advanced RAG includes query rewriting, multi-hop retrieval, and self-reflection on retrieval quality.
Connect a bucket and Mixpeek runs the whole multimodal search pipeline for you: extraction, indexing, and search over your own objects. No models to wire up, nothing to host.
Start with ManagedKeep your embeddings on your own cloud and run dense, sparse, and BM25 search directly on object storage. First 1M vectors free.
Start with MVS