InternVL3_5-8B
by OpenGVLab
4x faster InternVL3 with cascade reinforcement learning and dynamic resolution
OpenGVLab/InternVL3_5-8Bmixpeek://image_extractor@v1/opengvlab_internvl35_8b_v1Overview
InternVL 3.5 is a major upgrade over InternVL3, adding Cascade Reinforcement Learning for 16% better reasoning, a Visual Resolution Router for dynamic resolution allocation, and Decoupled Vision-Language Deployment for 4x inference speedup. It achieves SOTA among open-source VLMs on multimodal reasoning while fitting on a single A100.
On Mixpeek, InternVL 3.5 powers high-quality scene captioning, visual QA, and document understanding at significantly lower latency than its predecessor. The dynamic resolution router automatically allocates more pixels to complex images and fewer to simple ones.
Architecture
InternViT-300M vision encoder + InternLM3-8B language model. 8.5B total params. Cascade RL training with progressive difficulty. Visual Resolution Router dynamically selects 224-1024px resolution per image. Decoupled deployment separates vision and language inference for 4x speedup.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "video-library",source: { url: "https://example.com/presentation.mp4" },feature_extractors: [{feature: "scene_caption",model: "OpenGVLab/InternVL3_5-8B"}]});
Capabilities
- 16% better reasoning than InternVL3 via Cascade RL
- 4x faster inference via Decoupled Vision-Language Deployment
- Dynamic resolution: allocates pixels based on image complexity
- GUI interaction and embodied agency capabilities
- Thinking mode with explicit chain-of-thought reasoning
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| Overall reasoning (vs InternVL3) | Improvement | +16.0% | OpenGVLab, 2025 — arxiv:2508.18265 |
| Inference speed (vs InternVL3) | Speedup | 4.05x | OpenGVLab, 2025 — arxiv:2508.18265 |
Performance
Specification
Research Paper
InternVL3.5: Advancing Multimodal Understanding
arxiv.orgBuild a pipeline with InternVL3_5-8B
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio