The process of analyzing and interpreting user queries to determine their intent, entities, and context before executing a search. Query understanding improves multimodal retrieval by bridging the gap between what users type and what they actually need.
Query understanding processes a raw user query through multiple stages: query classification (determining intent type), entity recognition (extracting key entities), query expansion (adding related terms), spell correction, and query rewriting. The enriched query representation is then used to formulate optimized search requests against keyword indices, vector databases, or both.
Components include intent classifiers (is this a navigational, informational, or transactional query), named entity recognizers, synonym dictionaries, and query rewriters. Modern systems use LLMs for query expansion and rewriting. Multimodal query understanding handles text, image, and audio queries by routing each to appropriate processing. Query logs and click-through data provide training signals for improving understanding over time.