hassonofer commited on
Commit
1dc4dc6
·
verified ·
1 Parent(s): 515ce04

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +120 -3
README.md CHANGED
@@ -1,3 +1,120 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-classification
4
+ - birder
5
+ - pytorch
6
+ library_name: birder
7
+ license: apache-2.0
8
+ base_model:
9
+ - birder-project/hiera_abswin_base_mim
10
+ ---
11
+
12
+ # Model Card for hiera_abswin_base_mim-intermediate-eu-common
13
+
14
+ A Hiera image classification model. The model follows a three-stage training process: first, masked image modeling, next intermediate training on a large-scale dataset containing diverse bird species from around the world, finally fine-tuned specifically on the `eu-common` dataset.
15
+
16
+ The species list is derived from the Collins bird guide [^1].
17
+
18
+ [^1]: Svensson, L., Mullarney, K., & Zetterström, D. (2022). Collins bird guide (3rd ed.). London, England: William Collins.
19
+
20
+ ## Model Details
21
+
22
+ - **Model Type:** Image classification and detection backbone
23
+ - **Model Stats:**
24
+ - Params (M): 51.1
25
+ - Input image size: 384 x 384
26
+ - **Dataset:** eu-common (707 classes)
27
+ - Intermediate training involved ~6000 species from asia, europe and africa
28
+
29
+ - **Papers:**
30
+ - Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles: <https://arxiv.org/abs/2306.00989>
31
+ - Window Attention is Bugged: How not to Interpolate Position Embeddings: <https://arxiv.org/abs/2311.05613>
32
+
33
+ ## Model Usage
34
+
35
+ ### Image Classification
36
+
37
+ ```python
38
+ import birder
39
+ from birder.inference.classification import infer_image
40
+
41
+ (net, model_info) = birder.load_pretrained_model("hiera_abswin_base_mim-intermediate-eu-common", inference=True)
42
+
43
+ # Get the image size the model was trained on
44
+ size = birder.get_size_from_signature(model_info.signature)
45
+
46
+ # Create an inference transform
47
+ transform = birder.classification_transform(size, model_info.rgb_stats)
48
+
49
+ image = "path/to/image.jpeg" # or a PIL image, must be loaded in RGB format
50
+ (out, _) = infer_image(net, image, transform)
51
+ # out is a NumPy array with shape of (1, 707), representing class probabilities.
52
+ ```
53
+
54
+ ### Image Embeddings
55
+
56
+ ```python
57
+ import birder
58
+ from birder.inference.classification import infer_image
59
+
60
+ (net, model_info) = birder.load_pretrained_model("hiera_abswin_base_mim-intermediate-eu-common", inference=True)
61
+
62
+ # Get the image size the model was trained on
63
+ size = birder.get_size_from_signature(model_info.signature)
64
+
65
+ # Create an inference transform
66
+ transform = birder.classification_transform(size, model_info.rgb_stats)
67
+
68
+ image = "path/to/image.jpeg" # or a PIL image
69
+ (out, embedding) = infer_image(net, image, transform, return_embedding=True)
70
+ # embedding is a NumPy array with shape of (1, 768)
71
+ ```
72
+
73
+ ### Detection Feature Map
74
+
75
+ ```python
76
+ from PIL import Image
77
+ import birder
78
+
79
+ (net, model_info) = birder.load_pretrained_model("hiera_abswin_base_mim-intermediate-eu-common", inference=True)
80
+
81
+ # Get the image size the model was trained on
82
+ size = birder.get_size_from_signature(model_info.signature)
83
+
84
+ # Create an inference transform
85
+ transform = birder.classification_transform(size, model_info.rgb_stats)
86
+
87
+ image = Image.open("path/to/image.jpeg")
88
+ features = net.detection_features(transform(image).unsqueeze(0))
89
+ # features is a dict (stage name -> torch.Tensor)
90
+ print([(k, v.size()) for k, v in features.items()])
91
+ # Output example:
92
+ # [('stage1', torch.Size([1, 96, 96, 96])),
93
+ # ('stage2', torch.Size([1, 192, 48, 48])),
94
+ # ('stage3', torch.Size([1, 384, 24, 24])),
95
+ # ('stage4', torch.Size([1, 768, 12, 12]))]
96
+ ```
97
+
98
+ ## Citation
99
+
100
+ ```bibtex
101
+ @misc{ryali2023hierahierarchicalvisiontransformer,
102
+ title={Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles},
103
+ author={Chaitanya Ryali and Yuan-Ting Hu and Daniel Bolya and Chen Wei and Haoqi Fan and Po-Yao Huang and Vaibhav Aggarwal and Arkabandhu Chowdhury and Omid Poursaeed and Judy Hoffman and Jitendra Malik and Yanghao Li and Christoph Feichtenhofer},
104
+ year={2023},
105
+ eprint={2306.00989},
106
+ archivePrefix={arXiv},
107
+ primaryClass={cs.CV},
108
+ url={https://arxiv.org/abs/2306.00989},
109
+ }
110
+
111
+ @misc{bolya2023windowattentionbuggedinterpolate,
112
+ title={Window Attention is Bugged: How not to Interpolate Position Embeddings},
113
+ author={Daniel Bolya and Chaitanya Ryali and Judy Hoffman and Christoph Feichtenhofer},
114
+ year={2023},
115
+ eprint={2311.05613},
116
+ archivePrefix={arXiv},
117
+ primaryClass={cs.CV},
118
+ url={https://arxiv.org/abs/2311.05613},
119
+ }
120
+ ```