Set pipeline tag to image-classification and add code link
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,13 +1,13 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
| 2 |
license: other
|
| 3 |
license_name: nvclv1
|
| 4 |
license_link: LICENSE
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
pipeline_tag: image-feature-extraction
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
| 11 |
[**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**](https://arxiv.org/abs/2407.08083).
|
| 12 |
|
| 13 |
## Model Overview
|
|
@@ -37,7 +37,6 @@ MambaVision-B-21K is pretrained on ImageNet-21K dataset and finetuned on ImageNe
|
|
| 37 |
<td>224x224</td>
|
| 38 |
</tr>
|
| 39 |
|
| 40 |
-
|
| 41 |
</table>
|
| 42 |
|
| 43 |
In addition, the MambaVision models demonstrate a strong performance by achieving a new SOTA Pareto-front in
|
|
@@ -48,11 +47,11 @@ terms of Top-1 accuracy and throughput.
|
|
| 48 |
class="center">
|
| 49 |
</p>
|
| 50 |
|
| 51 |
-
|
| 52 |
## Model Usage
|
| 53 |
|
| 54 |
It is highly recommended to install the requirements for MambaVision by running the following:
|
| 55 |
|
|
|
|
| 56 |
|
| 57 |
```Bash
|
| 58 |
pip install mambavision
|
|
@@ -66,13 +65,11 @@ In the following example, we demonstrate how MambaVision can be used for image c
|
|
| 66 |
|
| 67 |
Given the following image from [COCO dataset](https://cocodataset.org/#home) val set as an input:
|
| 68 |
|
| 69 |
-
|
| 70 |
<p align="center">
|
| 71 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/64414b62603214724ebd2636/4duSnqLf4lrNiAHczSmAN.jpeg" width=70% height=70%
|
| 72 |
class="center">
|
| 73 |
</p>
|
| 74 |
|
| 75 |
-
|
| 76 |
The following snippet can be used for image classification:
|
| 77 |
|
| 78 |
```Python
|
|
@@ -136,7 +133,7 @@ transform = create_transform(input_size=input_resolution,
|
|
| 136 |
is_training=False,
|
| 137 |
mean=model.config.mean,
|
| 138 |
std=model.config.std,
|
| 139 |
-
crop_mode=model.config.
|
| 140 |
crop_pct=model.config.crop_pct)
|
| 141 |
inputs = transform(image).unsqueeze(0).cuda()
|
| 142 |
# model inference
|
|
@@ -147,7 +144,6 @@ print("Size of extracted features in stage 1:", features[0].size()) # torch.Size
|
|
| 147 |
print("Size of extracted features in stage 4:", features[3].size()) # torch.Size([1, 640, 7, 7])
|
| 148 |
```
|
| 149 |
|
| 150 |
-
|
| 151 |
### License:
|
| 152 |
|
| 153 |
[NVIDIA Source Code License-NC](https://huggingface.co/nvidia/MambaVision-B-21K/blob/main/LICENSE)
|
|
|
|
| 1 |
---
|
| 2 |
+
datasets:
|
| 3 |
+
- ILSVRC/imagenet-21k
|
| 4 |
license: other
|
| 5 |
license_name: nvclv1
|
| 6 |
license_link: LICENSE
|
| 7 |
+
pipeline_tag: image-classification
|
| 8 |
+
library_name: transformers
|
|
|
|
| 9 |
---
|
| 10 |
|
|
|
|
| 11 |
[**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**](https://arxiv.org/abs/2407.08083).
|
| 12 |
|
| 13 |
## Model Overview
|
|
|
|
| 37 |
<td>224x224</td>
|
| 38 |
</tr>
|
| 39 |
|
|
|
|
| 40 |
</table>
|
| 41 |
|
| 42 |
In addition, the MambaVision models demonstrate a strong performance by achieving a new SOTA Pareto-front in
|
|
|
|
| 47 |
class="center">
|
| 48 |
</p>
|
| 49 |
|
|
|
|
| 50 |
## Model Usage
|
| 51 |
|
| 52 |
It is highly recommended to install the requirements for MambaVision by running the following:
|
| 53 |
|
| 54 |
+
Code: https://github.com/NVlabs/MambaVision
|
| 55 |
|
| 56 |
```Bash
|
| 57 |
pip install mambavision
|
|
|
|
| 65 |
|
| 66 |
Given the following image from [COCO dataset](https://cocodataset.org/#home) val set as an input:
|
| 67 |
|
|
|
|
| 68 |
<p align="center">
|
| 69 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/64414b62603214724ebd2636/4duSnqLf4lrNiAHczSmAN.jpeg" width=70% height=70%
|
| 70 |
class="center">
|
| 71 |
</p>
|
| 72 |
|
|
|
|
| 73 |
The following snippet can be used for image classification:
|
| 74 |
|
| 75 |
```Python
|
|
|
|
| 133 |
is_training=False,
|
| 134 |
mean=model.config.mean,
|
| 135 |
std=model.config.std,
|
| 136 |
+
crop_mode=model.config.crop_pct,
|
| 137 |
crop_pct=model.config.crop_pct)
|
| 138 |
inputs = transform(image).unsqueeze(0).cuda()
|
| 139 |
# model inference
|
|
|
|
| 144 |
print("Size of extracted features in stage 4:", features[3].size()) # torch.Size([1, 640, 7, 7])
|
| 145 |
```
|
| 146 |
|
|
|
|
| 147 |
### License:
|
| 148 |
|
| 149 |
[NVIDIA Source Code License-NC](https://huggingface.co/nvidia/MambaVision-B-21K/blob/main/LICENSE)
|