A field combining signal processing and machine learning to analyze and extract meaningful information from music, including melody, rhythm, genre, mood, and structure. MIR powers music search, recommendation, and organization in audio-rich multimodal systems.
MIR systems analyze music audio to extract features at multiple levels: low-level acoustic features (spectral centroid, chroma, MFCC), mid-level representations (beat, tempo, key), and high-level semantic labels (genre, mood, instrument). Modern approaches use neural networks to learn hierarchical features directly from audio spectrograms, enabling tasks from beat tracking to music recommendation.
Core libraries include librosa for feature extraction, madmom for beat/tempo analysis, and essentia for comprehensive music analysis. Neural models use architectures similar to audio classification (CNN, transformer) trained on datasets like Million Song Dataset and MusicNet. Music embeddings can be generated using models like MERT or MusicFM for similarity-based retrieval. Tasks include genre classification, mood detection, instrument recognition, and music transcription.