Tempo-6B
by Vision-CAIR
Compact 6B model for hours-long video understanding via query-aware temporal compression
Vision-CAIR/Tempo-6Bmixpeek://video_extractor@v1/visioncair_tempo_6b_v1Overview
Tempo is a 6B-parameter vision-language model purpose-built for extreme long-video understanding. While most video VLMs struggle beyond a few minutes, Tempo processes hours-long videos by using Adaptive Token Allocation — a query-aware compression mechanism that allocates between 0.5 and 16 visual tokens per frame based on content relevance to the query.
Despite being 6B parameters, Tempo scores 52.3 on LVBench (average video length 4101 seconds), outperforming GPT-4o and Gemini 1.5 Pro on long-video benchmarks. On Mixpeek, Tempo is ideal for processing meeting recordings, surveillance footage, lectures, and other long-form video where understanding temporal structure across hours of content is critical.
Architecture
Vision encoder with query-aware Adaptive Token Allocation (ATA) that compresses video frames to 0.5–16 tokens each based on query relevance. 6B parameters. Processes videos up to several hours within bounded context windows by dynamically allocating representation budget across time.
Mixpeek SDK Integration
from mixpeek import Mixpeekmixpeek = Mixpeek(api_key="YOUR_API_KEY")mixpeek.ingest.videos(collection="meeting_recordings",source={"type": "s3", "bucket": "recordings"},pipeline={"captioning": {"model": "mixpeek://video_extractor@v1/visioncair_tempo_6b_v1"}})
Capabilities
- Hours-long video understanding (4000+ second videos)
- Query-aware temporal compression for efficient processing
- Outperforms GPT-4o on long-video benchmarks at 1/20th the size
- Temporal reasoning across scenes separated by minutes or hours
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| LVBench | Accuracy | 52.3 | Model card |
| Video-MME (long) | Accuracy | 58.7 | Model card |
| MLVU | Score | 67.4 | Model card |
Performance
Common Pipeline Companions
Specification
Research Paper
Model paper or technical report
arxiv.orgBuild a pipeline with Tempo-6B
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio