Perception-LM-3B
by facebook
Meta Perception Language Model checkpoint for detailed image and video understanding
facebook/Perception-LM-3Bmixpeek://image_extractor@v1/facebook_perception_lm_3b_v1Overview
Perception-LM-3B is part of Meta's PerceptionLM release for open, reproducible visual understanding research. The linked paper describes a transparent Perception Language Model stack for detailed image and video understanding, including human-labeled and synthetic data and a PLM-VideoBench evaluation for temporal perception.
On Mixpeek, Perception-LM-3B is useful when teams want a research-friendly VLM for building searchable descriptions of images and video clips. Its license is research-only, so it should be treated as an evaluation and prototyping model rather than a default commercial production choice.
Architecture
Autoregressive vision-language model from the PerceptionLM family. The model combines a Perception Encoder visual backbone with a language decoder and is released in 1B, 3B, and 8B scales for detailed visual understanding experiments.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "vlm-evals",source: { url: "s3://benchmarks/video-clips/" },feature_extractors: [{feature: "scene_caption",model: "facebook/Perception-LM-3B",params: {sample_rate: "1fps",caption_detail: "dense"}}]});
Capabilities
- Detailed image and video understanding
- Visual question answering over frames and clips
- Temporal video perception research via PLM-VideoBench
- Transparent data and training recipe for reproducible VLM evaluation
- Useful baseline for comparing closed and open visual reasoning models
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| PLM-VideoBench | Coverage | Introduced for temporal video understanding | PerceptionLM paper |
| Visual understanding tasks | Scope | Image and video understanding | HuggingFace paper page |
Performance
Research license requires access approval and noncommercial use
Specification
Research Paper
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
arxiv.orgBuild a pipeline with Perception-LM-3B
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio