NEWManaged multimodal retrieval.Explore platform →
    Models/Quality & Anomaly/nvidia/Cosmos-Embed1-448p-anomaly-detection
    HFAnomaly DetectionNVIDIA Open Model License

    Cosmos-Embed1-448p-anomaly-detection

    by nvidia

    Video anomaly embeddings for physical-AI monitoring and event retrieval

    10Kdl/month
    10likes
    1.2Bparams
    Identifiers
    Model ID
    nvidia/Cosmos-Embed1-448p-anomaly-detection
    Feature URI
    mixpeek://video_extractor@v1/nvidia_cosmos_embed1_anomaly_v1

    Overview

    Cosmos Embed1 Anomaly Detection is a NVIDIA video embedding model tuned for finding unusual events in physical-world video. The HuggingFace model card tags it for video-text retrieval, video embeddings, physical AI, and anomaly detection, making it a strong fit for cameras, robotics, warehouse footage, and field operations where an agent needs to notice abnormal visual behavior.

    On Mixpeek, Cosmos Embed1 can turn surveillance or operations video into searchable anomaly features. Agents can retrieve clips that look unsafe, off-process, or visually unusual, then combine those results with object detection, transcript search, or human review workflows.

    Architecture

    QFormer-based video-text embedder with an EVA-ViT-G visual backbone. The 448p anomaly variant processes 8 sampled video frames into 768-dimensional normalized embeddings, aligns video and text via contrastive training, and is LoRA-fine-tuned on the Vad-Reasoning anomaly dataset.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "operations-video",
    source: { url: "https://example.com/warehouse-camera.mp4" },
    feature_extractors: [{
    name: "anomaly_detection",
    version: "v1",
    params: {
    model_id: "nvidia/Cosmos-Embed1-448p-anomaly-detection"
    }
    }]
    });

    Capabilities

    • Video anomaly scoring for physical-world footage
    • Video-text retrieval over abnormal events
    • Fine-tuned Cosmos Embed1 backbone
    • Works as an agent perception signal for camera and robotics workflows
    • Pairs naturally with object detection and scene captioning

    Use Cases on Mixpeek

    Warehouse safety agents that find near misses, blocked aisles, or unusual motion
    Robotics monitoring systems that flag unexpected scene changes
    Manufacturing QA review over camera feeds and process videos
    Security workflows that prioritize abnormal clips before human review

    Benchmarks

    DatasetMetricScoreSource
    Vad-Reasoning test setTop-1 Hit Rate46.44%NVIDIA model card, 2026
    Vad-Reasoning test setTop-5 Hit Rate83.71%NVIDIA model card, 2026
    Kinetics-400 validationTop-1 Accuracy85.18%NVIDIA model card, 2026

    Performance

    Input Size448×448 px, 8 sampled frames
    Embedding Dim768
    GPU LatencyInput dependent
    GPU ThroughputBatch dependent
    GPU MemoryBF16 tested on A100/H100

    HF safetensors metadata reports 1.2B parameters; latency depends on clip sampling and batching.

    Specification

    FrameworkHF
    Organizationnvidia
    FeatureAnomaly Detection
    Outputanomaly score + map
    Modalitiesvideo, image
    RetrieverAnomaly Filter
    Parameters1.2B
    LicenseNVIDIA Open Model License
    Downloads/mo10K
    Likes10

    Research Paper

    Cosmos Predict1: World Foundation Model Platform for Physical AI

    arxiv.org

    Build a pipeline with Cosmos-Embed1-448p-anomaly-detection

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio