Mixpeek Logo
    Demo
    Audio

    Omnilingual ASR

    High-quality automatic speech recognition for 1600+ languages using Meta's multilingual ASR system

    Note: This playground provides simulated output to showcase functionality. No input data is processed or stored on our servers. Use this demo to explore the feature extractor's capabilities before integrating it into your application.

    Input

    Enter a URL to a audio file

    Drag and drop a audio file here, or click to browse

    Model architecture to use for transcription. Default: omniASR_LLM_7B

    ISO 639-3 language code (auto-detect if not specified, applies to LLM models). Default: auto

    Number of audio samples to process in parallel. Default: 1

    Compute device to use for inference. Default: cuda

    Precision for inference. Default: bfloat16

    Whether to normalize audio levels before processing. Default: true

    Target sample rate for audio processing. Default: 16000

    Whether to return word-level timestamps. Default: false

    Whether to return confidence scores for transcription. Default: false

    Output

    {
    "transcription": "This is the transcribed text from the audio file.",
    "language": "eng",
    "language_confidence": 0.98,
    "model_used": "omniASR_LLM_7B",
    "audio_metadata": {
    "duration": 30.5,
    "sample_rate": 16000,
    "channels": 1,
    "format": "wav"
    },
    "timestamps": [
    {
    "word": "This",
    "start": 0,
    "end": 0.24,
    "confidence": 0.99
    },
    {
    "word": "is",
    "start": 0.24,
    "end": 0.36,
    "confidence": 0.98
    },
    {
    "word": "the",
    "start": 0.36,
    "end": 0.52,
    "confidence": 0.97
    }
    ],
    "inference_metrics": {
    "processing_time_ms": 1250,
    "real_time_factor": 0.041,
    "model_size": "7.8B",
    "memory_used_mb": 17500
    },
    "segments": [
    {
    "text": "This is the transcribed text from the audio file.",
    "start": 0,
    "end": 30.5,
    "confidence": 0.96
    }
    ]
    }