Audio
Omnilingual ASR
High-quality automatic speech recognition for 1600+ languages using Meta's multilingual ASR system
Note: This playground provides simulated output to showcase functionality. No input data is processed or stored on our servers. Use this demo to explore the feature extractor's capabilities before integrating it into your application.
Input
Enter a URL to a audio file
Drag and drop a audio file here, or click to browse
Model architecture to use for transcription. Default: omniASR_LLM_7B
ISO 639-3 language code (auto-detect if not specified, applies to LLM models). Default: auto
Number of audio samples to process in parallel. Default: 1
Compute device to use for inference. Default: cuda
Precision for inference. Default: bfloat16
Whether to normalize audio levels before processing. Default: true
Target sample rate for audio processing. Default: 16000
Whether to return word-level timestamps. Default: false
Whether to return confidence scores for transcription. Default: false
Output
{"transcription": "This is the transcribed text from the audio file.","language": "eng","language_confidence": 0.98,"model_used": "omniASR_LLM_7B","audio_metadata": {"duration": 30.5,"sample_rate": 16000,"channels": 1,"format": "wav"},"timestamps": [{"word": "This","start": 0,"end": 0.24,"confidence": 0.99},{"word": "is","start": 0.24,"end": 0.36,"confidence": 0.98},{"word": "the","start": 0.36,"end": 0.52,"confidence": 0.97}],"inference_metrics": {"processing_time_ms": 1250,"real_time_factor": 0.041,"model_size": "7.8B","memory_used_mb": 17500},"segments": [{"text": "This is the transcribed text from the audio file.","start": 0,"end": 30.5,"confidence": 0.96}]}
