sam-vit-huge
by facebook
Promptable foundation model for image segmentation
facebook/sam-vit-hugemixpeek://image_extractor@v1/facebook_sam_vit_huge_v1Overview
SAM (Segment Anything Model) is Meta's foundation model for image segmentation. Given prompts like points, boxes, or text, it produces high-quality object masks. Trained on SA-1B — the largest segmentation dataset with 1 billion masks on 11M images.
On Mixpeek, SAM powers pixel-level object segmentation for precise content understanding, enabling mask-based filtering and region-specific feature extraction.
Architecture
ViT-H image encoder (632M params) with a lightweight mask decoder. Produces 256x256 low-res masks refined to full resolution. Supports multiple prompt types: points, boxes, and masks.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "my-collection",source: { url: "https://example.com/image.jpg" },feature_extractors: [{name: "segmentation",version: "v1",params: { model_id: "facebook/sam-vit-huge" }}]});
Capabilities
- Promptable segmentation with points, boxes, or masks
- Automatic mask generation for everything in an image
- Zero-shot transfer competitive with supervised models
- Trained on 1 billion masks (SA-1B dataset)
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| SA-1B (segmentation) | mIoU | 79.3 | Kirillov et al., 2023 — Table 1 |
| COCO (instance seg.) | AP | 46.5 | Kirillov et al., 2023 — Table 7 |
Performance
Image encoder runs once; mask decoder runs per prompt (~6ms)
Specification
Research Paper
Segment Anything
arxiv.orgBuild a pipeline with sam-vit-huge
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Pipeline Builder