NEWWhy single embeddings fail for video.Read the post →
    Models/Captioning/Qwen/Qwen3.6-35B-A3B
    HFScene CaptioningApache 2.0

    Qwen3.6-35B-A3B

    by Qwen

    35B MoE with only 3B active params — 73.4% SWE-bench, runs on a laptop

    310Kdl/month
    35B (3B active)params
    Identifiers
    Model ID
    Qwen/Qwen3.6-35B-A3B
    Feature URI
    mixpeek://image_extractor@v1/qwen36_35b_a3b_v1

    Overview

    Qwen3.6-35B-A3B is Alibaba's hybrid Mixture-of-Experts model with 35 billion total parameters but only 3 billion active per token, delivering frontier-class reasoning and coding at laptop-deployable cost. It combines Gated DeltaNet linear attention with standard Gated Attention and sparse MoE (256 experts, 8 routed + 1 shared) to achieve 73.4% on SWE-bench Verified and 92.6% on AIME 2026.

    On Mixpeek, Qwen3.6-35B-A3B serves as a powerful reasoning backbone for agentic pipelines, complex metadata generation, and code-driven content analysis. Its 262K native context (extensible to 1M via YaRN) handles full-length documents and long video transcripts, while the 3B active parameter footprint keeps inference costs manageable.

    Architecture

    Hybrid MoE with 40 layers in a repeating pattern: 10 x (3 x (Gated DeltaNet -> MoE) -> 1 x (Gated Attention -> MoE)). 256 experts per MoE layer, 8 routed + 1 shared active. Hidden dimension 2048. 35B total, 3B active per token. 262K native context with YaRN extension to 1M.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "my-collection",
    source: { url: "https://example.com/long-document.pdf" },
    feature_extractors: [{
    name: "scene_caption",
    version: "v1",
    params: {
    model_id: "Qwen/Qwen3.6-35B-A3B",
    prompt: "Extract key topics, entities, and structured metadata"
    }
    }]
    });

    Capabilities

    • 73.4% on SWE-bench Verified (code generation)
    • 92.6% on AIME 2026 (mathematical reasoning)
    • 262K native context, extensible to 1M via YaRN
    • Only 3B active parameters per token from 35B total
    • Vision capabilities included

    Use Cases on Mixpeek

    Complex metadata generation from long documents and video transcripts using full 262K context
    Agentic content analysis pipelines with multi-step reasoning over extracted features
    Code-driven structured extraction from multimodal content at low inference cost

    Benchmarks

    DatasetMetricScoreSource
    SWE-bench VerifiedPass Rate73.4%Alibaba, Apr 2026 — Model Card
    AIME 2026Accuracy92.6%Alibaba, Apr 2026 — Model Card
    Terminal-Bench 2.0Pass Rate51.5%Alibaba, Apr 2026 — Model Card

    Performance

    Input SizeText: 262K tokens (1M with YaRN); Vision: variable
    GPU Latency~35ms / token (A100, 3B active)
    GPU Throughput~80 tokens/sec on 12GB VRAM with MTP speculative decoding
    GPU Memory~12 GB (quantized), ~70 GB (FP16 full)

    Specification

    FrameworkHF
    OrganizationQwen
    FeatureScene Captioning
    Outputtext
    Modalitiesvideo, image
    RetrieverSemantic Search
    Parameters35B (3B active)
    LicenseApache 2.0
    Downloads/mo310K

    Research Paper

    Qwen3.6 Technical Report

    arxiv.org

    Build a pipeline with Qwen3.6-35B-A3B

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio