NEWAgents can now see video via MCP.Try it now →
    Models/Segmentation/facebook/sam3
    PyTorchSegmentationApache 2.0

    sam3

    by facebook

    Concept-level segmentation with open-vocabulary detection and video tracking

    420Kdl/month
    848Mparams
    Identifiers
    Model ID
    facebook/sam3
    Feature URI
    mixpeek://image_extractor@v1/facebook_sam3_v1

    Overview

    SAM 3 is Meta's unified foundation model for concept-level segmentation. It detects, segments, and tracks objects using open-vocabulary text prompts or visual exemplars, handling 270K+ unique concepts. It bridges the gap between detection and segmentation in a single model.

    On Mixpeek, SAM 3 enables concept-driven content analysis — specify any concept in text and SAM 3 will find, segment, and track every instance across images and video.

    Architecture

    Decoupled detector-tracker architecture sharing a vision encoder. 848M total parameters. Uses a presence token for discriminating closely related prompts. Trained on 4M+ automatically annotated concepts.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "my-collection",
    source: { url: "https://example.com/video.mp4" },
    feature_extractors: [{
    name: "segmentation",
    version: "v1",
    params: { model_id: "facebook/sam3" }
    }]
    });

    Capabilities

    • Open-vocabulary detection + segmentation (270K+ concepts)
    • Video tracking with mask propagation
    • Text and visual exemplar prompts
    • Concept-level exhaustive segmentation
    • Outperforms OWLv2, DINO-X, Gemini 2.5 on benchmarks

    Use Cases on Mixpeek

    Exhaustive concept detection across large video libraries
    Brand and logo tracking in video content
    Open-vocabulary content moderation at scale
    Concept-driven video analytics and tagging

    Benchmarks

    DatasetMetricScoreSource
    SA-V (video seg.)J&F83.2SAM 3 model card

    Performance

    Input Size1024×1024 px
    GPU Latency~15ms / frame (A100)
    GPU Throughput~66 frames/sec (A100)
    GPU Memory~3.0 GB

    Specification

    FrameworkPyTorch
    Organizationfacebook
    FeatureSegmentation
    Outputmask + label
    Modalitiesvideo, image
    RetrieverMask Filter
    Parameters848M
    LicenseApache 2.0
    Downloads/mo420K

    Research Paper

    SAM 3: Segment Anything with Concepts

    arxiv.org

    Build a pipeline with sam3

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Pipeline Builder