Audio

Seamless Expressive Translation

Translate speech across languages while preserving emotional tone, pauses, and vocal style

284K runs

Note: This playground provides simulated output to showcase functionality. No input data is processed or stored on our servers. Use this demo to explore the feature extractor's capabilities before integrating it into your application.

Input

File URL string

Enter a URL to a audio file

Upload audio

Drag and drop a audio file here, or click to browse

Select File

# translation_mode string

Type of translation to perform. Default: S2ST

# source_language string

Source language (auto-detect if not specified). Default: auto

# target_language string

Required

The language to translate into. Default: undefined

# model_variant string

SeamlessM4T model variant to use. Default: v2-large

# preserve_expressivity boolean

Whether to preserve emotional tone, pauses, and vocal style. Default: true

# duration_factor number

Controls the predicted duration and speech rate. Higher values result in slower speech.. Default: 1

# vocoder string

The vocoder to use for speech synthesis. Default: vocoder_pretssel

# sampling_rate integer

Audio sampling rate in Hz. Default: 16000

# chunk_length_s number

Length of audio chunks for processing (in seconds). Default: 30

# stride_length_s number

Stride length for overlapping chunks (in seconds). Default: 5

# normalize_audio boolean

Whether to normalize input audio levels. Default: true

# return_timestamps boolean

Whether to return word-level timestamps. Default: true

# generate_speech boolean

Whether to generate speech output (for S2ST and T2ST modes). Default: true

Output

{
  "translated_text": "Hola, ¿cómo estás hoy?",
  "source_language": "eng",
  "target_language": "spa",
  "translation_mode": "S2ST",
  "audio_output": {
    "duration": 3.2,
    "sample_rate": 16000,
    "format": "wav"
  },
  "expressivity": {
    "pitch": "preserved",
    "pauses": "preserved",
    "tempo": "preserved",
    "emotion": "preserved",
    "prosody_confidence": 0.94,
    "expressivity_score": 0.89
  },
  "timestamps": [
    {
      "word": "Hola",
      "start": 0,
      "end": 0.5,
      "confidence": 0.98
    },
    {
      "word": "¿cómo",
      "start": 0.6,
      "end": 1.1,
      "confidence": 0.96
    },
    {
      "word": "estás",
      "start": 1.2,
      "end": 1.7,
      "confidence": 0.97
    },
    {
      "word": "hoy?",
      "start": 1.8,
      "end": 2.3,
      "confidence": 0.95
    }
  ],
  "inference_metrics": {
    "duration_factor_used": 1,
    "vocoder_used": "vocoder_pretssel",
    "model_variant": "v2-large",
    "processing_time_ms": 1250,
    "audio_quality_score": 0.92
  },
  "language_detection": {
    "detected_language": "eng",
    "confidence": 0.99,
    "alternative_languages": [
      {
        "language": "eng-US",
        "confidence": 0.85
      },
      {
        "language": "eng-GB",
        "confidence": 0.14
      }
    ]
  },
  "segments": [
    {
      "source_text": "How are you today?",
      "translated_text": "¿Cómo estás hoy?",
      "start_time": 0,
      "end_time": 2.3,
      "speaker_id": "speaker_1"
    }
  ]
}