DepthPro

by apple

Zero-shot metric monocular depth estimation with sharp boundaries in under a second

520Kdl/month

~350Mparams

HuggingFace Run it on your own data, free

Identifiers

Model ID

apple/DepthPro

Feature URI

mixpeek://image_extractor@v1/apple_depthpro_v1

Overview

DepthPro is Apple's foundation model for zero-shot metric monocular depth estimation, producing 2.25-megapixel depth maps (1536x1536) in 0.3 seconds on a V100 GPU. Unlike relative depth models, DepthPro predicts absolute metric depth without requiring camera intrinsics, and includes a built-in focal length estimator. Its multi-scale ViT architecture with a shared DINOv2 encoder and DPT-like fusion stage preserves sharp object boundaries.

On Mixpeek, DepthPro enables metric-accurate spatial understanding of images and video frames, powering use cases like 3D scene reconstruction, spatial filtering in retrieval, and depth-aware content organization.

Architecture

Multi-scale Vision Transformer with shared DINOv2 encoder processing image patches at multiple resolutions. DPT-like fusion stage merges and upsamples features for dense prediction. Built-in focal length estimation head. Outputs 1536x1536 metric depth maps with absolute scale.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "depth_estimation",
    version: "v1",
    parameters: { model_id: "apple/DepthPro" },
  },
});

Capabilities

Zero-shot metric depth (absolute scale, no camera intrinsics needed)
2.25-megapixel output (1536x1536) in 0.3s
Sharp boundary preservation via multi-scale architecture
Built-in focal length estimation from a single image
State-of-the-art boundary accuracy metrics

Use Cases on Mixpeek

3D scene reconstruction from single images or video frames

Depth-aware retrieval and spatial filtering in media pipelines

Augmented reality content creation with metric-accurate depth

Benchmarks

Dataset	Metric	Score	Source
NYUv2	AbsRel	0.036	Bochkovskii et al., 2024: Depth Pro paper
KITTI	AbsRel	0.039	Bochkovskii et al., 2024: Depth Pro paper
Boundary F1	F1 (depth edges)	State-of-the-art	Bochkovskii et al., 2024: Depth Pro paper