nielsr HF Staff commited on
Commit
71f01aa
·
verified ·
1 Parent(s): 07f24ef

Update model card: set pipeline tag, add library name and citation

Browse files

This PR fixes the pipeline tag, setting it to `image-classification` as that is the main usecase of this model. It also adds the `library_name` tag, so that the "how to use" button appears on the top right of the model page. Finally, citation information and star history are added for better discoverability.

Files changed (1) hide show
  1. README.md +51 -21
README.md CHANGED
@@ -1,13 +1,13 @@
1
  ---
 
 
2
  license: other
3
  license_name: nvclv1
4
  license_link: LICENSE
5
- datasets:
6
- - ILSVRC/imagenet-1k
7
- pipeline_tag: image-feature-extraction
8
  ---
9
 
10
-
11
  [**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**](https://arxiv.org/abs/2407.08083).
12
 
13
  ## Model Overview
@@ -17,38 +17,34 @@ We have developed the first hybrid model for computer vision which leverages the
17
  ## Model Performance
18
 
19
  MambaVision demonstrates a strong performance by achieving a new SOTA Pareto-front in
20
- terms of Top-1 accuracy and throughput.
21
 
22
  <p align="center">
23
- <img src="https://github.com/NVlabs/MambaVision/assets/26806394/79dcf841-3966-4b77-883d-76cd5e1d4320" width=70% height=70%
24
  class="center">
25
  </p>
26
 
27
-
28
  ## Model Usage
29
 
30
  It is highly recommended to install the requirements for MambaVision by running the following:
31
 
32
-
33
  ```Bash
34
  pip install mambavision
35
  ```
36
 
37
- For each model, we offer two variants for image classification and feature extraction that can be imported with 1 line of code.
38
 
39
  ### Image Classification
40
 
41
- In the following example, we demonstrate how MambaVision can be used for image classification.
42
-
43
- Given the following image from [COCO dataset](https://cocodataset.org/#home) val set as an input:
44
 
 
45
 
46
  <p align="center">
47
- <img src="https://cdn-uploads.huggingface.co/production/uploads/64414b62603214724ebd2636/4duSnqLf4lrNiAHczSmAN.jpeg" width=70% height=70%
48
  class="center">
49
  </p>
50
 
51
-
52
  The following snippet can be used for image classification:
53
 
54
  ```Python
@@ -77,18 +73,18 @@ transform = create_transform(input_size=input_resolution,
77
  inputs = transform(image).unsqueeze(0).cuda()
78
  # model inference
79
  outputs = model(inputs)
80
- logits = outputs['logits']
81
  predicted_class_idx = logits.argmax(-1).item()
82
  print("Predicted class:", model.config.id2label[predicted_class_idx])
83
  ```
84
 
85
- The predicted label is ```brown bear, bruin, Ursus arctos.```
86
 
87
  ### Feature Extraction
88
 
89
- MambaVision can also be used as a generic feature extractor.
90
 
91
- Specifically, we can extract the outputs of each stage of model (4 stages) as well as the final averaged-pool features that are flattened.
92
 
93
  The following snippet can be used for feature extraction:
94
 
@@ -98,7 +94,7 @@ from PIL import Image
98
  from timm.data.transforms_factory import create_transform
99
  import requests
100
 
101
- model = AutoModel.from_pretrained("nvidia/MambaVision-T2-1K", trust_remote_code=True)
102
 
103
  # eval mode for inference
104
  model.cuda().eval()
@@ -123,7 +119,41 @@ print("Size of extracted features in stage 1:", features[0].size()) # torch.Size
123
  print("Size of extracted features in stage 4:", features[3].size()) # torch.Size([1, 640, 7, 7])
124
  ```
125
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
126
 
127
- ### License:
128
 
129
- [NVIDIA Source Code License-NC](https://huggingface.co/nvidia/MambaVision-T-1K/blob/main/LICENSE)
 
1
  ---
2
+ datasets:
3
+ - ILSVRC/imagenet-1k
4
  license: other
5
  license_name: nvclv1
6
  license_link: LICENSE
7
+ pipeline_tag: image-classification
8
+ library_name: transformers
 
9
  ---
10
 
 
11
  [**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**](https://arxiv.org/abs/2407.08083).
12
 
13
  ## Model Overview
 
17
  ## Model Performance
18
 
19
  MambaVision demonstrates a strong performance by achieving a new SOTA Pareto-front in
20
+ terms of Top-1 accuracy and throughput.
21
 
22
  <p align="center">
23
+ <img src="https://github.com/NVlabs/MambaVision/assets/26806394/79dcf841-3966-4b77-883d-76cd5e1d4320" width=70% height=70%
24
  class="center">
25
  </p>
26
 
 
27
  ## Model Usage
28
 
29
  It is highly recommended to install the requirements for MambaVision by running the following:
30
 
 
31
  ```Bash
32
  pip install mambavision
33
  ```
34
 
35
+ For each model, we offer two variants for image classification and feature extraction that can be imported with 1 line of code.
36
 
37
  ### Image Classification
38
 
39
+ In the following example, we demonstrate how MambaVision can be used for image classification.
 
 
40
 
41
+ Given the following image from [COCO dataset](https://cocodataset.org/#home) val set as an input:
42
 
43
  <p align="center">
44
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/64414b62603214724ebd2636/4duSnqLf4lrNiAHczSmAN.jpeg" width=70% height=70%
45
  class="center">
46
  </p>
47
 
 
48
  The following snippet can be used for image classification:
49
 
50
  ```Python
 
73
  inputs = transform(image).unsqueeze(0).cuda()
74
  # model inference
75
  outputs = model(inputs)
76
+ logits = outputs['logits']
77
  predicted_class_idx = logits.argmax(-1).item()
78
  print("Predicted class:", model.config.id2label[predicted_class_idx])
79
  ```
80
 
81
+ The predicted label is `brown bear, bruin, Ursus arctos.`
82
 
83
  ### Feature Extraction
84
 
85
+ MambaVision can also be used as a generic feature extractor.
86
 
87
+ Specifically, we can extract the outputs of each stage of model (4 stages) as well as the final averaged-pool features that are flattened.
88
 
89
  The following snippet can be used for feature extraction:
90
 
 
94
  from timm.data.transforms_factory import create_transform
95
  import requests
96
 
97
+ model = AutoModel.from_pretrained("nvidia/MambaVision-T-1K", trust_remote_code=True)
98
 
99
  # eval mode for inference
100
  model.cuda().eval()
 
119
  print("Size of extracted features in stage 4:", features[3].size()) # torch.Size([1, 640, 7, 7])
120
  ```
121
 
122
+ ### License:
123
+
124
+ [NVIDIA Source Code License-NC](https://huggingface.co/nvidia/MambaVision-T-1K/blob/main/LICENSE)
125
+
126
+ ## Citation
127
+
128
+ If you find MambaVision to be useful for your work, please consider citing our paper:
129
+
130
+ ```
131
+ @article{hatamizadeh2024mambavision,
132
+ title={MambaVision: A Hybrid Mamba-Transformer Vision Backbone},
133
+ author={Hatamizadeh, Ali and Kautz, Jan},
134
+ journal={arXiv preprint arXiv:2407.08083},
135
+ year={2024}
136
+ }
137
+ ```
138
+
139
+ ## Star History
140
+
141
+ [![Stargazers repo roster for @NVlabs/MambaVision](https://bytecrank.com/nastyox/reporoster/php/stargazersSVG.php?user=NVlabs&repo=MambaVision)](https://github.com/NVlabs/MambaVision/stargazers)
142
+
143
+ [![Star History Chart](https://api.star-history.com/svg?repos=NVlabs/MambaVision&type=Date)](https://star-history.com/#NVlabs/MambaVision&Date)
144
+
145
+ ## Licenses
146
+
147
+ Copyright © 2025, NVIDIA Corporation. All rights reserved.
148
+
149
+ This work is made available under the NVIDIA Source Code License-NC. Click [here](LICENSE) to view a copy of this license.
150
+
151
+ The pre-trained models are shared under [CC-BY-NC-SA-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
152
+
153
+ For license information regarding the timm repository, please refer to its [repository](https://github.com/rwightman/pytorch-image-models).
154
+
155
+ For license information regarding the ImageNet dataset, please see the [ImageNet official website](https://www.image-net.org/).
156
 
157
+ ## Acknowledgement
158
 
159
+ This repository is built on top of the [timm](https://github.com/huggingface/pytorch-image-models) repository. We thank [Ross Wrightman](https://rwightman.com/) for creating and maintaining this high-quality library.