# Depth Anything for Semantic Segmentation We use our Depth Anything pre-trained ViT-L encoder to fine-tune downstream semantic segmentation models. ## Performance ### Cityscapes Note that our results are obtained *without* Mapillary pre-training. | Method | Encoder | mIoU (s.s.) | m.s. | |:-:|:-:|:-:|:-:| | SegFormer | MiT-B5 | 82.4 | 84.0 | | Mask2Former | Swin-L | 83.3 | 84.3 | | OneFormer | Swin-L | 83.0 | 84.4 | | OneFormer | ConNeXt-XL | 83.6 | 84.6 | | DDP | ConNeXt-L | 83.2 | 83.9 | | **Ours** | ViT-L | **84.8** | **86.2** | ### ADE20K | Method | Encoder | mIoU | |:-:|:-:|:-:| | SegFormer | MiT-B5 | 51.0 | | Mask2Former | Swin-L | 56.4 | | UperNet | BEiT-L | 56.3 | | ViT-Adapter | BEiT-L | 58.3 | | OneFormer | Swin-L | 57.4 | | OneFormer | ConNeXt-XL | 57.4 | | **Ours** | ViT-L | **59.4** | ## Pre-trained models - [Cityscapes-ViT-L-mIoU-86.4](https://huggingface.co/spaces/LiheYoung/Depth-Anything/blob/main/checkpoints_semseg/cityscapes_vitl_mIoU_86.4.pth) - [ADE20K-ViT-L-mIoU-59.4](https://huggingface.co/spaces/LiheYoung/Depth-Anything/blob/main/checkpoints_semseg/ade20k_vitl_mIoU_59.4.pth) ## Installation Please refer to [MMSegmentation](https://github.com/open-mmlab/mmsegmentation/blob/main/docs/en/get_started.md#installation) for instructions. *Do not forget to install ``mmdet`` to support ``Mask2Former``:* ```bash pip install "mmdet>=3.0.0rc4" ``` After installation: - move our [config/depth_anything](./config/depth_anything/) to mmseg's [config](https://github.com/open-mmlab/mmsegmentation/tree/main/configs) - move our [dinov2.py](./dinov2.py) to mmseg's [backbones](https://github.com/open-mmlab/mmsegmentation/tree/main/mmseg/models/backbones) - add DINOv2 in mmseg's [models/backbones/\_\_init\_\_.py](https://github.com/open-mmlab/mmsegmentation/blob/main/mmseg/models/backbones/__init__.py) - download our provided [torchhub](https://github.com/LiheYoung/Depth-Anything/tree/main/torchhub) directory and put it at the root of your working directory - download the [Depth Anything pre-trained model](https://huggingface.co/spaces/LiheYoung/Depth-Anything/blob/main/checkpoints/depth_anything_vitl14.pth) (to initialize the encoder) and 2) put it under the ``checkpoints`` folder. For training or inference with our pre-trained models, please refer to MMSegmentation [instructions](https://github.com/open-mmlab/mmsegmentation/blob/main/docs/en/user_guides/4_train_test.md).