ParaVT-8B
by ParaVT
Agentic long-video model trained for parallel temporal tool calls
ParaVT/ParaVT-8Bmixpeek://video_extractor@v1/paravt_8b_v1Overview
ParaVT-8B is a video-text-to-text model focused on long-video understanding through tool use. Its model card describes a parallel video tool-calling approach where the model can dispatch multiple temporal crop requests in one turn instead of walking sequentially through a video.
On Mixpeek, ParaVT is relevant when an agent needs to search long clips and then decide which time windows to inspect. It belongs after retrieval: use vector, transcript, or scene search to narrow the corpus, then let ParaVT reason over candidate spans and request focused temporal crops.
Architecture
Final post-RL ParaVT checkpoint based on Qwen3VLForConditionalGeneration and Qwen/Qwen3-VL-8B-Instruct. The release uses cold-start SFT followed by PARA-GRPO reinforcement learning for parseable, parallel temporal tool-calling behavior.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "long-video-agent",source: { url: "s3://field-video/day-shift.mp4" },feature_extractors: [{ feature: "transcription", model: "openai/whisper-large-v3" },{ feature: "visual_embeddings", model: "facebook/vjepa2-vitl-fpc64-256" },{ feature: "scene_caption", model: "ParaVT/ParaVT-8B" }]});
Capabilities
- Video-text-to-text reasoning over long clips
- Parallel temporal crop tool calls
- Agentic RL training for video tool use
- Apache 2.0 license
- Drop-in Transformers and vLLM deployment pattern
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| ParaVT release | Task focus | Long-video parallel tool calling | ParaVT model card |
| Hugging Face | Release age | Created May 2026 | HF model metadata |
Performance
Best used after retrieval has narrowed the video corpus or candidate time spans
Specification
Research Paper
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
arxiv.orgBuild a pipeline with ParaVT-8B
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio