A technique for generating compact, comparison-friendly representations of visual content that are robust to minor modifications. Visual fingerprints (perceptual hashes) allow efficient near-duplicate detection even when images have been resized, compressed, cropped, or color-adjusted.
Visual fingerprinting reduces an image to a compact hash that captures its perceptual essence. The image is resized, converted to grayscale, and transformed using algorithms like DCT (Discrete Cosine Transform) or wavelet decomposition. The resulting hash is compared using Hamming distance, where similar images produce similar hashes even after modification. For video, fingerprints are generated per frame or per scene.
Common algorithms include pHash (perceptual hash using DCT), dHash (difference hash using gradient patterns), and aHash (average hash). pHash is the most robust to modifications. Hash sizes are typically 64-256 bits, enabling storage of millions of fingerprints in memory. Hamming distance thresholds determine match sensitivity: 0-10 bits for near-exact matches, 10-20 for moderate modifications.