NEWManaged multimodal retrieval.Explore platform →
    Models/Detection & Recognition/Roboflow/rf-detr-large
    HFObject DetectionApache 2.0

    rf-detr-large

    by Roboflow

    Real-time detection transformer with DINOv2-style visual features

    210dl/month
    Largeparams
    Identifiers
    Model ID
    Roboflow/rf-detr-large
    Feature URI
    mixpeek://image_extractor@v1/roboflow_rf_detr_large_v1

    Overview

    RF-DETR Large is a real-time detection transformer from Roboflow. It combines a ViT backbone, multi-scale feature fusion, and a deformable DETR-style decoder to produce object boxes without anchor heuristics.

    On Mixpeek, RF-DETR Large adds a modern open object detector for pipelines that need high-quality bounding boxes before retrieval, filtering, or agent inspection.

    Architecture

    End-to-end detection transformer with a DINOv2-with-registers style ViT backbone, RF-DETR windowed attention, a multi-scale projector, deformable cross-attention decoder, and DETR-style object queries trained on COCO 2017.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "camera-frames",
    source: { url: "https://example.com/frame.jpg" },
    feature_extractors: [{
    feature: "object_detection",
    model: "Roboflow/rf-detr-large"
    }]
    });

    Capabilities

    • Object detection over the COCO 2017 label space
    • Transformer-based boxes without anchor design
    • Multi-scale feature fusion for small and large objects
    • Apache 2.0 license

    Use Cases on Mixpeek

    Filter video moments by detected people, vehicles, products, or equipment
    Agent perception over camera frames with structured object metadata
    Retail shelf and warehouse object indexing
    Detection-first pipelines before crop-level embedding or captioning

    Performance

    Input SizeImage or video frame
    GPU Latency~12ms / frame (A100, batch dependent)
    GPU Throughput~80 frames/sec (A100, batch dependent)
    GPU Memory~2 GB

    Use detection output as structured metadata for filters and joins

    Specification

    FrameworkHF
    OrganizationRoboflow
    FeatureObject Detection
    Outputbbox + label
    Modalitiesvideo, image
    RetrieverObject Filter
    ParametersLarge
    LicenseApache 2.0
    Downloads/mo210

    Research Paper

    RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

    arxiv.org

    Build a pipeline with rf-detr-large

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio