Depth-Anything-V2-Large
by depth-anything
Foundation model for monocular depth estimation with synthetic-to-real training
depth-anything/Depth-Anything-V2-Largemixpeek://image_extractor@v1/depth_anything_v2_large_v1Overview
Depth Anything V2 Large is a 335M-parameter monocular depth estimation model that produces dense per-pixel depth maps from single images. Built on a DINOv2-Large encoder with a DPT decoder, it is trained via a teacher-student paradigm: a giant ViT-G teacher learns from 595K synthetic images, then supervises student models on 62M pseudo-labeled real images to bridge the synthetic-to-real domain gap.
On Mixpeek, Depth Anything V2 extracts depth maps from video frames and images, enabling spatial-aware retrieval such as finding scenes with specific depth compositions, foreground/background separation, or 3D layout understanding.
Architecture
DINOv2-Large (ViT-L) encoder with 24 layers feeding into a DPT (Dense Prediction Transformer) decoder. Intermediate features from DINOv2 are fused at multiple scales for dense depth prediction. Teacher-student training with ViT-G teacher on synthetic data.
Mixpeek SDK Integration
from mixpeek import Mixpeekmx = Mixpeek(api_key="YOUR_KEY")mx.ingest(collection_id="video-scenes",source="s3://footage/",extractors=[{"type": "depth_estimation","model": "depth-anything/Depth-Anything-V2-Large","output_feature": "depth_map"}])
Capabilities
- Dense per-pixel relative depth estimation
- 10x faster than diffusion-based depth models
- Robust across indoor, outdoor, and synthetic scenes
- Fine-grained boundary preservation
- Metric depth variant available for absolute scale
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| NYUv2 | AbsRel | 0.043 | Yang et al., 2024 — Depth Anything V2 paper |
| KITTI | AbsRel | 0.044 | Yang et al., 2024 — Depth Anything V2 paper |
| Sintel | AbsRel | 0.280 | Yang et al., 2024 — Depth Anything V2 paper |
Performance
Specification
Research Paper
Depth Anything V2
arxiv.orgBuild a pipeline with Depth-Anything-V2-Large
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio