NEWVectors or files. Pick a path.Start →
    Models/Detection & Recognition/nvidia/LocateAnything-3B
    HFObject Detectionother

    LocateAnything-3B

    by nvidia

    Open-vocabulary visual grounding for locating arbitrary objects in images

    132Kdl/month
    1,831likes
    3.8Bparams
    Identifiers
    Model ID
    nvidia/LocateAnything-3B
    Feature URI
    mixpeek://image_extractor@v1/nvidia_locateanything_3b_v1

    Overview

    LocateAnything 3B is an NVIDIA vision-language model for open-vocabulary localization. Instead of predicting only a fixed detector label set, it uses a text prompt to identify and localize the requested visual target.

    On Mixpeek, LocateAnything is useful when an agent needs structured evidence from images or frames but the target classes are not known when the pipeline is built. The agent can ask for objects, UI components, safety conditions, or domain-specific items and store the resulting boxes as searchable metadata.

    Architecture

    3B-class vision-language model exposed as an image-text-to-text Transformers checkpoint. It accepts visual input plus a grounding prompt and returns localization-oriented outputs.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "inspection-images",
    source: { url: "s3://field-inspections/" },
    feature_extractors: [{
    feature: "object_detection",
    model: "nvidia/LocateAnything-3B"
    }]
    });

    Capabilities

    • Open-vocabulary object localization
    • Promptable image grounding
    • Useful for long-tail object classes
    • Transforms visual observations into structured metadata

    Use Cases on Mixpeek

    Agent inspection of images where labels are decided at query time
    Locate brand assets, UI controls, products, or safety equipment
    Frame-level grounding before crop embedding or visual QA
    Long-tail visual search beyond COCO-style categories

    Specification

    FrameworkHF
    Organizationnvidia
    FeatureObject Detection
    Outputbbox + label
    Modalitiesvideo, image
    RetrieverObject Filter
    Parameters3.8B
    Licenseother
    Downloads/mo132K
    Likes1,831

    Research Paper

    LocateAnything 3B

    arxiv.org

    Build a pipeline with LocateAnything-3B

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio