nemotron-3.5-asr-streaming-0.6b
by nvidia
600M multilingual streaming ASR with cache-aware FastConformer-RNNT
nvidia/nemotron-3.5-asr-streaming-0.6bmixpeek://transcription@v1/nvidia_nemotron_35_asr_streaming_v1Overview
Nemotron 3.5 ASR Streaming 0.6B is NVIDIA's multilingual streaming speech recognition model. The model card describes a 600M parameter cache-aware FastConformer-RNNT model that supports transcription across 40 language-locales and runtime chunk sizes from 80ms through 1120ms.
On Mixpeek, Nemotron 3.5 is useful for agent tools that need low-latency spoken evidence from meetings, calls, streams, and videos. The transcript becomes searchable text, while language tags, timestamps, speakers, and source URIs stay in metadata so the agent can cite the exact evidence instead of returning an ungrounded transcript blob.
Architecture
Cache-aware FastConformer encoder with 24 layers, RNNT decoder, and language-ID prompt conditioning. The cache-aware design reuses encoder context during streaming inference, avoiding redundant overlap computation in chunked ASR.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "streaming-audio",source: { url: "s3://calls/live-captures/" },feature_extractors: [{feature: "audio_transcription",model: "nvidia/nemotron-3.5-asr-streaming-0.6b",params: {target_lang: "auto",chunk_ms: 320,return_language_tags: true}}]});
Capabilities
- Multilingual ASR across 40 language-locales
- Streaming transcription with configurable chunk sizes
- Automatic language detection and language tagging
- Punctuation and capitalization in output text
- NeMo deployment path for production speech pipelines
Use Cases on Mixpeek
Performance
Chunk size controls the latency and accuracy tradeoff at runtime
Specification
Research Paper
Nemotron 3.5 ASR Streaming 0.6B model card
arxiv.orgBuild a pipeline with nemotron-3.5-asr-streaming-0.6b
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio