nemotron-3.5-asr-streaming-0.6b

by nvidia

600M multilingual streaming ASR with cache-aware FastConformer-RNNT

4.2Kdl/month

600Mparams

HuggingFace Use in Pipeline

Identifiers

Model ID

nvidia/nemotron-3.5-asr-streaming-0.6b

Feature URI

mixpeek://transcription@v1/nvidia_nemotron_35_asr_streaming_v1

Overview

Nemotron 3.5 ASR Streaming 0.6B is NVIDIA's multilingual streaming speech recognition model. The model card describes a 600M parameter cache-aware FastConformer-RNNT model that supports transcription across 40 language-locales and runtime chunk sizes from 80ms through 1120ms.

On Mixpeek, Nemotron 3.5 is useful for agent tools that need low-latency spoken evidence from meetings, calls, streams, and videos. The transcript becomes searchable text, while language tags, timestamps, speakers, and source URIs stay in metadata so the agent can cite the exact evidence instead of returning an ungrounded transcript blob.

Architecture

Cache-aware FastConformer encoder with 24 layers, RNNT decoder, and language-ID prompt conditioning. The cache-aware design reuses encoder context during streaming inference, avoiding redundant overlap computation in chunked ASR.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

await mx.collections.ingest({
  collection_id: "streaming-audio",
  source: { url: "s3://calls/live-captures/" },
  feature_extractors: [{
    feature: "audio_transcription",
    model: "nvidia/nemotron-3.5-asr-streaming-0.6b",
    params: {
      target_lang: "auto",
      chunk_ms: 320,
      return_language_tags: true
    }
  }]
});