NEWAgents can now see video via MCP.Try it now →
    HFObject Detectionapache-2.0

    yolos-tiny

    by hustvl

    You Only Look at One Sequence, ViT-based real-time object detection

    107Kdl/month
    280likes
    6Mparams
    Identifiers
    Model ID
    hustvl/yolos-tiny
    Feature URI
    mixpeek://image_extractor@v1/hustvl_yolos_tiny_v1

    Overview

    YOLOS adapts the Vision Transformer (ViT) architecture for object detection by simply appending detection tokens to the input sequence. It demonstrates that a pure transformer can perform object detection without any convolutional components.

    On Mixpeek, YOLOS Tiny provides a lightweight, fast alternative to DETR for object detection tasks where speed is prioritized over maximum accuracy.

    Architecture

    Vision Transformer (ViT-Tiny) with 12 layers. Appends 100 learnable detection tokens to the image patch sequence. Uses bipartite matching loss like DETR.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "my-collection",
    source: { url: "https://example.com/video.mp4" },
    feature_extractors: [{
    name: "object_detection",
    version: "v1",
    params: {
    model_id: "hustvl/yolos-tiny"
    }
    }]
    });

    Capabilities

    • Lightweight ViT-based object detection
    • Fast inference suitable for real-time processing
    • COCO object categories
    • Pure transformer architecture (no CNN backbone)

    Use Cases on Mixpeek

    Real-time video analysis where low latency is critical
    Edge deployment scenarios with limited compute
    High-throughput batch processing of large video archives

    Benchmarks

    DatasetMetricScoreSource
    COCO val2017AP (box)30.4Fang et al., 2021 — Table 1
    COCO val2017AP5048.6Fang et al., 2021 — Table 1

    Performance

    Input Size512×864 px
    GPU Latency~6ms / image (A100)
    CPU Latency~55ms / image
    GPU Throughput~165 images/sec (A100)
    GPU Memory~0.4 GB

    6.5M params — optimized for edge and high-throughput scenarios

    Specification

    FrameworkHF
    Organizationhustvl
    FeatureObject Detection
    Outputbbox + label
    Modalitiesvideo, image
    RetrieverObject Filter
    Parameters6M
    Licenseapache-2.0
    Downloads/mo107K
    Likes280

    Research Paper

    You Only Look at One Sequence

    arxiv.org

    Build a pipeline with yolos-tiny

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Pipeline Builder