NEWVectors or files. Pick a path.Start →
    Models/Speech & Audio/nvidia/nemotron-3.5-asr-streaming-0.6b
    NeMoTranscriptionOpenMDW-1.1

    nemotron-3.5-asr-streaming-0.6b

    by nvidia

    600M multilingual streaming ASR with cache-aware FastConformer-RNNT

    4.2Kdl/month
    600Mparams
    Identifiers
    Model ID
    nvidia/nemotron-3.5-asr-streaming-0.6b
    Feature URI
    mixpeek://transcription@v1/nvidia_nemotron_35_asr_streaming_v1

    Overview

    Nemotron 3.5 ASR Streaming 0.6B is NVIDIA's multilingual streaming speech recognition model. The model card describes a 600M parameter cache-aware FastConformer-RNNT model that supports transcription across 40 language-locales and runtime chunk sizes from 80ms through 1120ms.

    On Mixpeek, Nemotron 3.5 is useful for agent tools that need low-latency spoken evidence from meetings, calls, streams, and videos. The transcript becomes searchable text, while language tags, timestamps, speakers, and source URIs stay in metadata so the agent can cite the exact evidence instead of returning an ungrounded transcript blob.

    Architecture

    Cache-aware FastConformer encoder with 24 layers, RNNT decoder, and language-ID prompt conditioning. The cache-aware design reuses encoder context during streaming inference, avoiding redundant overlap computation in chunked ASR.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "streaming-audio",
    source: { url: "s3://calls/live-captures/" },
    feature_extractors: [{
    feature: "audio_transcription",
    model: "nvidia/nemotron-3.5-asr-streaming-0.6b",
    params: {
    target_lang: "auto",
    chunk_ms: 320,
    return_language_tags: true
    }
    }]
    });

    Capabilities

    • Multilingual ASR across 40 language-locales
    • Streaming transcription with configurable chunk sizes
    • Automatic language detection and language tagging
    • Punctuation and capitalization in output text
    • NeMo deployment path for production speech pipelines

    Use Cases on Mixpeek

    Low-latency captions for live meetings and media streams
    Searchable transcript extraction for multilingual video libraries
    Agent retrieval over spoken evidence with language metadata
    Audio indexing pipelines that need batch and streaming ASR options

    Performance

    Input SizeMono audio stream or audio file
    GPU LatencyConfigurable 80ms to 1120ms chunk sizes
    GPU ThroughputBatch dependent
    GPU Memory600M ASR deployment class

    Chunk size controls the latency and accuracy tradeoff at runtime

    Specification

    FrameworkNeMo
    Organizationnvidia
    FeatureTranscription
    Outputtext + timestamps
    Modalitiesvideo, audio
    RetrieverTranscript Search
    Parameters600M
    LicenseOpenMDW-1.1
    Downloads/mo4.2K

    Research Paper

    Nemotron 3.5 ASR Streaming 0.6B model card

    arxiv.org

    Build a pipeline with nemotron-3.5-asr-streaming-0.6b

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio