Audio

Omnilingual ASR

High-quality automatic speech recognition for 1600+ languages using Meta's multilingual ASR system

425K runs

Note: This playground provides simulated output to showcase functionality. No input data is processed or stored on our servers. Use this demo to explore the feature extractor's capabilities before integrating it into your application.

Input

File URL string

Enter a URL to a audio file

Upload audio

Drag and drop a audio file here, or click to browse

Select File

# model_card string

Model architecture to use for transcription. Default: omniASR_LLM_7B

# language_code string

ISO 639-3 language code (auto-detect if not specified, applies to LLM models). Default: auto

# batch_size integer

Number of audio samples to process in parallel. Default: 1

# device string

Compute device to use for inference. Default: cuda

# dtype string

Precision for inference. Default: bfloat16

# normalize_audio boolean

Whether to normalize audio levels before processing. Default: true

# sample_rate integer

Target sample rate for audio processing. Default: 16000

# return_timestamps boolean

Whether to return word-level timestamps. Default: false

# return_confidence boolean

Whether to return confidence scores for transcription. Default: false

Output

{
  "transcription": "This is the transcribed text from the audio file.",
  "language": "eng",
  "language_confidence": 0.98,
  "model_used": "omniASR_LLM_7B",
  "audio_metadata": {
    "duration": 30.5,
    "sample_rate": 16000,
    "channels": 1,
    "format": "wav"
  },
  "timestamps": [
    {
      "word": "This",
      "start": 0,
      "end": 0.24,
      "confidence": 0.99
    },
    {
      "word": "is",
      "start": 0.24,
      "end": 0.36,
      "confidence": 0.98
    },
    {
      "word": "the",
      "start": 0.36,
      "end": 0.52,
      "confidence": 0.97
    }
  ],
  "inference_metrics": {
    "processing_time_ms": 1250,
    "real_time_factor": 0.041,
    "model_size": "7.8B",
    "memory_used_mb": 17500
  },
  "segments": [
    {
      "text": "This is the transcribed text from the audio file.",
      "start": 0,
      "end": 30.5,
      "confidence": 0.96
    }
  ]
}