A technique that generates compact, robust identifiers from audio content for recognition and matching. Acoustic fingerprinting enables content identification, copyright detection, and deduplication across large audio and video collections in multimodal systems.
Acoustic fingerprinting extracts a compact summary of the spectral characteristics of an audio signal. The audio is divided into short frames, and robust features (spectral peaks, energy bands) are extracted from each frame to create a fingerprint hash. This fingerprint can be matched against a database of known fingerprints to identify the content, even when the audio has been compressed, cropped, or has background noise.
Algorithms like Chromaprint (used by AcoustID) extract chroma features and hash them into compact binary fingerprints. Shazam's algorithm uses constellation maps of spectral peaks. Fingerprints are typically 32-128 bits per frame and are stored in hash-based lookup tables for sub-second matching against databases of millions of tracks. Robustness to distortion is achieved through perceptually motivated feature selection.