Cosmos-Embed1-448p-anomaly-detection
by nvidia
Video anomaly embeddings for physical-AI monitoring and event retrieval
nvidia/Cosmos-Embed1-448p-anomaly-detectionmixpeek://video_extractor@v1/nvidia_cosmos_embed1_anomaly_v1Overview
Cosmos Embed1 Anomaly Detection is a NVIDIA video embedding model tuned for finding unusual events in physical-world video. The HuggingFace model card tags it for video-text retrieval, video embeddings, physical AI, and anomaly detection, making it a strong fit for cameras, robotics, warehouse footage, and field operations where an agent needs to notice abnormal visual behavior.
On Mixpeek, Cosmos Embed1 can turn surveillance or operations video into searchable anomaly features. Agents can retrieve clips that look unsafe, off-process, or visually unusual, then combine those results with object detection, transcript search, or human review workflows.
Architecture
QFormer-based video-text embedder with an EVA-ViT-G visual backbone. The 448p anomaly variant processes 8 sampled video frames into 768-dimensional normalized embeddings, aligns video and text via contrastive training, and is LoRA-fine-tuned on the Vad-Reasoning anomaly dataset.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "operations-video",source: { url: "https://example.com/warehouse-camera.mp4" },feature_extractors: [{name: "anomaly_detection",version: "v1",params: {model_id: "nvidia/Cosmos-Embed1-448p-anomaly-detection"}}]});
Capabilities
- Video anomaly scoring for physical-world footage
- Video-text retrieval over abnormal events
- Fine-tuned Cosmos Embed1 backbone
- Works as an agent perception signal for camera and robotics workflows
- Pairs naturally with object detection and scene captioning
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| Vad-Reasoning test set | Top-1 Hit Rate | 46.44% | NVIDIA model card, 2026 |
| Vad-Reasoning test set | Top-5 Hit Rate | 83.71% | NVIDIA model card, 2026 |
| Kinetics-400 validation | Top-1 Accuracy | 85.18% | NVIDIA model card, 2026 |
Performance
HF safetensors metadata reports 1.2B parameters; latency depends on clip sampling and batching.
Specification
Research Paper
Cosmos Predict1: World Foundation Model Platform for Physical AI
arxiv.orgBuild a pipeline with Cosmos-Embed1-448p-anomaly-detection
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio