gemma-4-31B-it
by google
Top-3 open VLM with 256K context for dense visual document understanding
google/gemma-4-31B-itmixpeek://image_extractor@v1/google_gemma4_31b_v1Overview
Gemma 4 31B is Google's dense vision-language model, currently ranked #3 among open models on the Arena AI text leaderboard. Unlike the MoE variant (27B-A4B), this dense model activates all 31B parameters, delivering the highest quality at higher compute cost.
The 256K context window and built-in thinking mode make it particularly strong for complex document understanding tasks where accuracy matters more than throughput.
Architecture
Dense transformer architecture with 31B parameters. Vision encoder processes image patches. 256K context window. Thinking mode enables chain-of-thought reasoning for complex visual tasks.
Mixpeek SDK Integration
from mixpeek import Mixpeekmx = Mixpeek(api_key="YOUR_KEY")mx.ingest(collection_id="technical-docs",source="s3://diagrams/",extractors=[{"type": "scene_caption","model": "google/gemma-4-31B-it","output_feature": "caption"},{"type": "text_embedding","model": "Qwen/Qwen3-Embedding-4B","input_field": "caption","output_feature": "caption_embedding"}])
Capabilities
- Highest-quality open VLM (Arena #3)
- 256K context window
- Dense architecture for fine-tuning
- Built-in reasoning mode
- Apache 2.0 license
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| MMLU Pro | Accuracy | 85.2% | Google, May 2026 |
| AIME 2026 | Accuracy | 89.2% | Google, May 2026 |
| Arena AI Leaderboard | ELO | Top 3 open | Arena AI, May 2026 |
Performance
Specification
Research Paper
Gemma 4: Byte for byte, the most capable open models
arxiv.orgBuild a pipeline with gemma-4-31B-it
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio