gemma-4-E4B-it
by google
Efficient 4B multimodal VLM with Per-Layer Embeddings for on-device AI
google/gemma-4-E4B-itmixpeek://image_extractor@v1/google_gemma4_e4b_v1Overview
Gemma 4 E4B is Google DeepMind's efficient multimodal model that uses Per-Layer Embeddings (PLE) to achieve the representational depth of a larger model while maintaining a compact inference footprint. With 4.5 billion effective parameters, it processes text, images, and audio with a 128K token context window, making it one of the most capable small models available.
On Mixpeek, Gemma 4 E4B powers lightweight multimodal understanding tasks including scene captioning, visual question answering, and document analysis where you need strong accuracy without the compute overhead of larger models.
Architecture
Decoder-only transformer with hybrid attention interleaving local sliding-window and full global attention. Uses Per-Layer Embeddings (PLE) that feed a secondary embedding signal into every decoder layer, enabling 4.5B effective parameters from a 2.3B-active compute footprint. Final layer always uses global attention.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "my-collection",source: { url: "https://example.com/video.mp4" },feature_extractors: [{name: "scene_description",version: "v1",params: {model_id: "google/gemma-4-E4B-it"}}]});
Capabilities
- Multimodal input: text, image, and audio understanding
- 128K token context window
- Built-in thinking mode for step-by-step reasoning
- Per-Layer Embeddings for compute-efficient inference
- Fits under 1.5 GB with 2-bit quantization
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| AIME 2026 | Accuracy | 42.5% | Google Gemma 4 technical report |
| MMLU Pro | Accuracy | ~55% | Gemma 4 E4B model card |
Performance
4.5B effective params via PLE — only 2.3B active at runtime
Specification
Research Paper
Gemma 4 model overview
arxiv.orgBuild a pipeline with gemma-4-E4B-it
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio