fushh7 commited on
Commit
f8e5a75
·
verified ·
1 Parent(s): 63a3cf3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ ## Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models
5
+
6
+ This is the official PyTorch implementation of [Frozen-DETR]() (NeurIPS 2024).
7
+
8
+ Please see our [GitHub](https://github.com/iSEE-Laboratory/Frozen-DETR)
9
+
10
+ ### 1 Introduction
11
+
12
+ Recent vision foundation models can extract universal representations and show impressive abilities in various tasks. However, their application on object detection is largely overlooked, especially without fine-tuning them. In this work, we show that frozen foundation models can be a versatile feature enhancer, even though they are not pre-trained for object detection. Specifically, we explore directly transferring the high-level image understanding of foundation models to detectors in the following two ways. First, the class token in foundation models provides an in-depth understanding of the complex scene, which facilitates decoding object queries in the detector’s decoder by providing a compact context. Additionally, the patch tokens in foundation models can enrich the features in the detector’s encoder by providing semantic details. Utilizing frozen foundation models as plug-and-play modules rather than the commonly used backbone can significantly enhance the detector’s performance while preventing the problems caused by the architecture discrepancy between the detector’s backbone and the foundation model. With such a novel paradigm, we boost the SOTA query-based detector DINO from 49.0% AP to 51.9% AP (+2.9% AP) and further to 53.8% AP (+4.8% AP) by integrating one or two foundation models respectively, on the COCO validation set after training for 12 epochs with R50 as the detector’s backbone.
13
+
14
+
15
+ ### 2 TODO
16
+
17
+ - [ ] Release the code of other DETR models (DAB-DETR, DN-DETR, MS-DETR, HPR)
18
+ - [x] Release the code of DINO and Co-DINO
19
+
20
+ ### 3 Data preparation
21
+
22
+ We expect the directory structure to be the following:
23
+
24
+ ```
25
+ |--dataset
26
+ |--coco
27
+ |--annotations
28
+ |--instances_train2017.json
29
+ |--instances_val2017.json
30
+ |--train2017
31
+ |--val2017
32
+ |--lvis_v1
33
+ |--annotations
34
+ |--lvis_v1_train.json # for stardard setting
35
+ |--lvis_v1_train_seen.json # for open-vocabulary setting
36
+ |--lvis_v1_val.json
37
+ |--train2017
38
+ |--val2017
39
+ |--pretrained_models
40
+ |--dinov2_vitl14_pretrain.pth
41
+ |--lvis_base_inds.txt
42
+ |--swin_base_patch4_window7_224_22k.pth
43
+ |--torchvision_resnet50.pth
44
+ |--ViT-L-14-336px.pt
45
+ ```
46
+
47
+ - `dinov2_vitl14_pretrain.pth`: [download link](https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth)
48
+ - `ViT-L-14-336px.pt`: [download link](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt)
49
+ - `swin_base_patch4_window7_224_22k.pth`: [download link](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224_22k.pth)
50
+ - `lvis_v1_train_seen.json`: [download link](https://drive.google.com/file/d/1dZQ5ytHgJPv4VgYOyjJerq4adc6GQkkd/view?usp=sharing)
51
+
52
+ ### 4 License
53
+
54
+ Frozen-DETR is released under the Apache 2.0 license. Please see the LICENSE file for more information.
55
+
56
+ ### 5 Bibtex
57
+
58
+ If you find our work helpful for your research, please consider citing the following BibTeX entry.
59
+
60
+ ```
61
+ @inproceedings{fu2024frozen-detr,
62
+ title={Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models},
63
+ author={Fu, Shenghao and Yan, Junkai and Yang, Qize and Wei, Xihan and Xie, Xiaohua and Zheng, Wei-Shi},
64
+ booktitle={NeurIPS},
65
+ year={2024},
66
+ }
67
+ ```
68
+
69
+