xingyang1
/

Distill-Any-Depth-Large-hf

+---
+library_name: transformers
+license: mit
+pipeline_tag: depth-estimation
+arxiv: <2502.19204>
+tags:
+- distill-any-depth
+- vision
+---
+# Distill Any Depth Large - Transformers Version
+## Introduction
+We present Distill-Any-Depth, a new SOTA monocular depth estimation model trained with our proposed knowledge distillation algorithms. It was introduced in the paper [Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator](http://arxiv.org/abs/2502.19204).
+This model checkpoint is compatible with the transformers library.
+[Online demo](https://huggingface.co/spaces/xingyang1/Distill-Any-Depth).
+### How to use
+Here is how to use this model to perform zero-shot depth estimation:
+```python
+from transformers import pipeline
+from PIL import Image
+import requests
+# load pipe
+pipe = pipeline(task="depth-estimation", model="xingyang1/Distill-Any-Depth-Large-hf")
+# load image
+url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
+image = Image.open(requests.get(url, stream=True).raw)
+# inference
+depth = pipe(image)["depth"]
+```
+Alternatively, you can use the model and processor classes:
+```python
+from transformers import AutoImageProcessor, AutoModelForDepthEstimation
+import torch
+import numpy as np
+from PIL import Image
+import requests
+url = "http://images.cocodataset.org/val2017/000000039769.jpg"
+image = Image.open(requests.get(url, stream=True).raw)
+image_processor = AutoImageProcessor.from_pretrained("xingyang1/Distill-Any-Depth-Large-hf")
+model = AutoModelForDepthEstimation.from_pretrained("xingyang1/Distill-Any-Depth-Large-hf")
+# prepare image for the model
+inputs = image_processor(images=image, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+# interpolate to original size and visualize the prediction
+post_processed_output = image_processor.post_process_depth_estimation(
+    outputs,
+    target_sizes=[(image.height, image.width)],
+)
+predicted_depth = post_processed_output[0]["predicted_depth"]
+depth = (predicted_depth - predicted_depth.min()) / (predicted_depth.max() - predicted_depth.min())
+depth = depth.detach().cpu().numpy() * 255
+depth = Image.fromarray(depth.astype("uint8"))
+)
+```
+If you find this project useful, please consider citing:
+```bibtex
+@article{he2025distill,
+  title   = {Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator},
+  author  = {Xiankang He and Dongyan Guo and Hongji Li and Ruibo Li and Ying Cui and Chi Zhang},
+  year    = {2025},
+  journal = {arXiv preprint arXiv: 2502.19204}
+}
+```
+## Model Card Author
+[Parteek Kamboj](https://huggingface.co/keetrap)