The phenomenon where AI models produce outputs that are fluent and plausible-sounding but factually incorrect, unsupported by the input, or entirely fabricated. Hallucination is a critical challenge in multimodal AI systems that affects trust and reliability.
Hallucination occurs because generative models learn statistical patterns rather than factual knowledge. Language models predict probable next tokens based on training patterns, which can produce confident-sounding statements that are factually wrong. Multimodal models may describe objects not present in images, attribute incorrect actions to video scenes, or generate plausible but fabricated details. Hallucination is a fundamental property of current generative models, not a bug.
Types include intrinsic hallucination (contradicting the source input), extrinsic hallucination (adding information not in the source), factual hallucination (incorrect real-world facts), and faithfulness hallucination (not reflecting the retrieved context). Detection methods include natural language inference (NLI) models, fact-checking against knowledge bases, self-consistency checks, and specialized hallucination detectors. Mitigation strategies include RAG, grounding, constrained decoding, and RLHF.
Connect a bucket and Mixpeek runs the whole multimodal search pipeline for you: extraction, indexing, and search over your own objects. No models to wire up, nothing to host.
Start with ManagedKeep your embeddings on your own cloud and run dense, sparse, and BM25 search directly on object storage. First 1M vectors free.
Start with MVS