Qwen3.6-35B-A3B

by Qwen

35B MoE with only 3B active params — 73.4% SWE-bench, runs on a laptop

310Kdl/month

35B (3B active)params

HuggingFace Run on your data, free

Identifiers

Model ID

Qwen/Qwen3.6-35B-A3B

Feature URI

mixpeek://image_extractor@v1/qwen36_35b_a3b_v1

Overview

Qwen3.6-35B-A3B is Alibaba's hybrid Mixture-of-Experts model with 35 billion total parameters but only 3 billion active per token, delivering frontier-class reasoning and coding at laptop-deployable cost. It combines Gated DeltaNet linear attention with standard Gated Attention and sparse MoE (256 experts, 8 routed + 1 shared) to achieve 73.4% on SWE-bench Verified and 92.6% on AIME 2026.

On Mixpeek, Qwen3.6-35B-A3B serves as a powerful reasoning backbone for agentic pipelines, complex metadata generation, and code-driven content analysis. Its 262K native context (extensible to 1M via YaRN) handles full-length documents and long video transcripts, while the 3B active parameter footprint keeps inference costs manageable.

Architecture

Hybrid MoE with 40 layers in a repeating pattern: 10 x (3 x (Gated DeltaNet -> MoE) -> 1 x (Gated Attention -> MoE)). 256 experts per MoE layer, 8 routed + 1 shared active. Hidden dimension 2048. 35B total, 3B active per token. 262K native context with YaRN extension to 1M.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "scene_caption",
    version: "v1",
    parameters: { model_id: "Qwen/Qwen3.6-35B-A3B" },
  },
});

Capabilities

73.4% on SWE-bench Verified (code generation)
92.6% on AIME 2026 (mathematical reasoning)
262K native context, extensible to 1M via YaRN
Only 3B active parameters per token from 35B total
Vision capabilities included

Use Cases on Mixpeek

Complex metadata generation from long documents and video transcripts using full 262K context

Agentic content analysis pipelines with multi-step reasoning over extracted features

Code-driven structured extraction from multimodal content at low inference cost

Benchmarks

Dataset	Metric	Score	Source
SWE-bench Verified	Pass Rate	73.4%	Alibaba, Apr 2026 — Model Card
AIME 2026	Accuracy	92.6%	Alibaba, Apr 2026 — Model Card
Terminal-Bench 2.0	Pass Rate	51.5%	Alibaba, Apr 2026 — Model Card