ParaVT-8B
by ParaVT
Agentic long-video model trained for parallel temporal tool calls
ParaVT/ParaVT-8Bmixpeek://video_extractor@v1/paravt_8b_v1Overview
ParaVT-8B is a video-text-to-text model focused on long-video understanding through tool use. Its model card describes a parallel video tool-calling approach where the model can dispatch multiple temporal crop requests in one turn instead of walking sequentially through a video.
On Mixpeek, ParaVT is relevant when an agent needs to search long clips and then decide which time windows to inspect. It belongs after retrieval: use vector, transcript, or scene search to narrow the corpus, then let ParaVT reason over candidate spans and request focused temporal crops.
Architecture
Final post-RL ParaVT checkpoint based on Qwen3VLForConditionalGeneration and Qwen/Qwen3-VL-8B-Instruct. The release uses cold-start SFT followed by PARA-GRPO reinforcement learning for parseable, parallel temporal tool-calling behavior.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";
const mx = new Mixpeek({ apiKey: "API_KEY" });
// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
namespace_id: "my-namespace",
collection_name: "my-collection",
source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
feature_extractor: {
feature_extractor_name: "transcription",
version: "v1",
parameters: { model_id: "openai/whisper-large-v3" },
},
});Capabilities
- Video-text-to-text reasoning over long clips
- Parallel temporal crop tool calls
- Agentic RL training for video tool use
- Apache 2.0 license
- Drop-in Transformers and vLLM deployment pattern
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| ParaVT release | Task focus | Long-video parallel tool calling | ParaVT model card |
| Hugging Face | Release age | Created May 2026 | HF model metadata |
Performance
Best used after retrieval has narrowed the video corpus or candidate time spans
Common Pipeline Companions
Explore on Mixpeek
Compare alternatives in this category
Hand-picked tools & platforms compared
Deep-dive technical guide
See how Mixpeek runs models as extractors
Store & search embeddings at scale
Usage-based pricing for pipelines
Compare models, APIs & infrastructure
Specification
Research Paper
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
arxiv.orgBuild a pipeline with ParaVT-8B
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio