Content-Based Image Retrieval (CBIR) is the application of computer vision techniques to search and retrieve images from large databases based on their visual content rather than manually assigned metadata or text tags. CBIR systems analyze color, texture, shape, and semantic features to find images that match a given query.
CBIR systems extract visual feature descriptors from each image in the database during an offline indexing phase. When a user submits a query image, the same feature extraction process is applied. The system then compares the query features against the indexed features using a distance or similarity metric, returning the most similar images ranked by score. Modern CBIR systems use deep learning embeddings that capture high-level semantic content rather than low-level pixel features.
Traditional CBIR used handcrafted features like color histograms, Gabor textures, and SIFT descriptors. Modern approaches use deep convolutional neural networks or Vision Transformers to extract dense feature vectors (typically 256-2048 dimensions). These vectors are stored in specialized vector indices (HNSW, FAISS IVF, ScaNN) that support billion-scale retrieval in milliseconds. Relevance feedback mechanisms allow users to refine results interactively, and re-ranking stages improve precision beyond initial retrieval.