What sentiment categories are detected in audio analysis?

The primary sentiment axis is positive, negative, and neutral, each with a confidence score from 0 to 1. Additionally, fine-grained emotions are detected: joy, anger, frustration, sadness, surprise, fear, and neutral. Each utterance receives both a primary sentiment label and an emotion label.

Does audio sentiment analysis use acoustic features or just the transcript?

Both. Lexical sentiment is derived from the words spoken (transcript analysis), while acoustic sentiment analyzes vocal features like pitch variation, speaking rate, energy, and voice quality. The two signals are fused with configurable weighting via the `fusion_weights` parameter. This dual approach catches sarcasm and emotional undertones that text-only analysis misses.

Can I track sentiment over time throughout a conversation?

Yes. The response includes a time-series sentiment trajectory with scores at regular intervals (configurable via `interval_seconds`). This is useful for visualizing how sentiment evolves during a call, identifying the exact moment a conversation turns negative.

How does speaker-level sentiment attribution work?

When `speaker_diarization` is enabled, each speaker receives an independent sentiment profile including their average sentiment, emotion distribution, and sentiment trend over the conversation. This allows you to compare customer versus agent sentiment side by side.

data

Audio
Sentiment
Converter

Analyze the sentiment and emotional tone of audio recordings by combining speech transcription with acoustic feature analysis. Detects positive, negative, and neutral sentiment at utterance and segment levels, with additional emotion classification for anger, joy, frustration, and more.

Max file size: 2 GB

Estimated: 2-6 min per hour of audio

5 input formats

How It Works

Upload an audio file or provide a URL to the Mixpeek API.

The audio is transcribed with speaker diarization and utterance-level segmentation.

Lexical sentiment is analyzed from the transcript text using an NLP model.

Acoustic sentiment is analyzed from vocal features including pitch, energy, speaking rate, and tone.

Lexical and acoustic scores are fused into a combined sentiment and emotion profile per segment.

Code Examples

from mixpeek import Mixpeek

client = Mixpeek(api_key="YOUR_API_KEY")

result = client.convert(
    source="https://example.com/support-call.mp3",
    from_format="audio",
    to_format="sentiment",
    options={
        "speaker_diarization": True,
        "include_emotions": True,
        "include_trajectory": True,
        "interval_seconds": 30
    }
)

print(f"Overall sentiment: {result.overall.label} ({result.overall.score:.2f})")
for speaker in result.speakers:
    print(f"  {speaker.id}: {speaker.sentiment.label} ({speaker.sentiment.score:.2f})")
for point in result.trajectory:
    print(f"  [{point.time}s] {point.label}: {point.score:.2f}")

Use Cases

Analyze customer satisfaction trends across call center recordings

Monitor agent tone and empathy during support interactions

Detect escalation points in recorded disputes and complaints

Measure audience engagement and emotional response in focus group recordings

Supported Input Formats

MP3

WAV

FLAC

OGG

AAC

Quick Info

Categorydata

Max File Size2 GB

Est. Time2-6 min per hour of audio

Extractoraudio-transcriber

Try This Conversion

Get started with the Mixpeek API and convert your first file in minutes.

Frequently Asked Questions

Related Converters

Audio

Text

Audio to Text

Transcribe audio files into text with high accuracy. Supports speaker diarization, punctuation restoration, timestamps, and over 50 languages. Handles podcasts, calls, meetings, and broadcast audio.

Audio

Summary

Audio to Summary

Generate concise summaries from audio recordings by transcribing speech and synthesizing key points. Supports meeting minutes, podcast summaries, and interview highlights with configurable length and format.

Video

Metadata

Video to Metadata

Extract comprehensive technical and semantic metadata from video files. Returns codec details, resolution, duration, frame rate, and AI-generated semantic tags including detected objects, scenes, dominant colors, and content categories.

Audio

Keywords

Audio to Keywords

Extract semantically relevant keywords and key phrases from audio recordings. Transcribes speech, identifies salient terms using NLP, and ranks them by relevance and frequency. Ideal for content tagging, topic detection, and search optimization.

Ready to convert audio to sentiment?

Start using the Mixpeek Audio to Sentiment in minutes. Sign up for a free API key and follow the documentation to get started.

AudioSentimentConverter

How It Works

Code Examples

Use Cases

Supported Input Formats

Quick Info

Try This Conversion

Frequently Asked Questions

Related Converters

Audio to Text

Audio to Summary

Video to Metadata

Audio to Keywords

Ready to convert audio to sentiment?

Audio
Sentiment
Converter