The phenomenon where AI models produce outputs that are fluent and plausible-sounding but factually incorrect, unsupported by the input, or entirely fabricated. Hallucination is a critical challenge in multimodal AI systems that affects trust and reliability.
Hallucination occurs because generative models learn statistical patterns rather than factual knowledge. Language models predict probable next tokens based on training patterns, which can produce confident-sounding statements that are factually wrong. Multimodal models may describe objects not present in images, attribute incorrect actions to video scenes, or generate plausible but fabricated details. Hallucination is a fundamental property of current generative models, not a bug.
Types include intrinsic hallucination (contradicting the source input), extrinsic hallucination (adding information not in the source), factual hallucination (incorrect real-world facts), and faithfulness hallucination (not reflecting the retrieved context). Detection methods include natural language inference (NLI) models, fact-checking against knowledge bases, self-consistency checks, and specialized hallucination detectors. Mitigation strategies include RAG, grounding, constrained decoding, and RLHF.