sam2.1-hiera-large
by facebook
Unified promptable segmentation for images and video with streaming memory
facebook/sam2.1-hiera-largemixpeek://image_extractor@v1/facebook_sam2_large_v1Overview
SAM 2 extends SAM to video with a streaming memory architecture for real-time processing. It's 6x faster than SAM on images with better accuracy, and the first foundation model that segments and tracks objects across video frames with prompts.
On Mixpeek, SAM 2 enables video-native segmentation — track objects across frames, segment specific items at any point in a video, and extract per-object features over time.
Architecture
Hiera image encoder with streaming memory for temporal context. SAM 2.1 Large: 224.4M params, 39.5 FPS on A100. Memory attention modules propagate masks across frames without re-computing the full image encoder.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "my-collection",source: { url: "https://example.com/video.mp4" },feature_extractors: [{name: "segmentation",version: "v1",params: { model_id: "facebook/sam2.1-hiera-large" }}]});
Capabilities
- Video object segmentation and tracking
- 6x faster than SAM on images
- Streaming memory architecture for real-time video
- Multi-object tracking with mask propagation
- Image segmentation with improved accuracy
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| SA-V (video seg.) | J&F | 79.5 | Ravi et al., 2024 — Table 1 |
| DAVIS 2017 (val) | J&F | 82.0 | Ravi et al., 2024 — Table 2 |
Performance
Streaming architecture — processes video frames sequentially with memory
Specification
Research Paper
SAM 2: Segment Anything in Images and Videos
arxiv.orgBuild a pipeline with sam2.1-hiera-large
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Pipeline Builder