nvidia
/

MambaVision-B-21K

@@ -1,13 +1,13 @@
 ---
 license: other
 license_name: nvclv1
 license_link: LICENSE
-datasets:
-- ILSVRC/imagenet-21k
-pipeline_tag: image-feature-extraction
 ---
 [**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**](https://arxiv.org/abs/2407.08083).
 ## Model Overview
@@ -37,7 +37,6 @@ MambaVision-B-21K is pretrained on ImageNet-21K dataset and finetuned on ImageNe
     <td>224x224</td>
 </tr>
 </table>
 In addition, the MambaVision models demonstrate a strong performance by achieving a new SOTA Pareto-front in
@@ -48,11 +47,11 @@ terms of Top-1 accuracy and throughput.
 class="center">
 </p>
 ## Model Usage
 It is highly recommended to install the requirements for MambaVision by running the following:
 ```Bash
 pip install mambavision
@@ -66,13 +65,11 @@ In the following example, we demonstrate how MambaVision can be used for image c
 Given the following image from [COCO dataset](https://cocodataset.org/#home)  val set as an input:
 <p align="center">
 <img src="https://cdn-uploads.huggingface.co/production/uploads/64414b62603214724ebd2636/4duSnqLf4lrNiAHczSmAN.jpeg" width=70% height=70%
 class="center">
 </p>
 The following snippet can be used for image classification:
 ```Python
@@ -136,7 +133,7 @@ transform = create_transform(input_size=input_resolution,
                              is_training=False,
                              mean=model.config.mean,
                              std=model.config.std,
-                             crop_mode=model.config.crop_mode,
                              crop_pct=model.config.crop_pct)
 inputs = transform(image).unsqueeze(0).cuda()
 # model inference
@@ -147,7 +144,6 @@ print("Size of extracted features in stage 1:", features[0].size()) # torch.Size
 print("Size of extracted features in stage 4:", features[3].size()) # torch.Size([1, 640, 7, 7])
 ```
 ### License:
 [NVIDIA Source Code License-NC](https://huggingface.co/nvidia/MambaVision-B-21K/blob/main/LICENSE)

 ---
+datasets:
+- ILSVRC/imagenet-21k
 license: other
 license_name: nvclv1
 license_link: LICENSE
+pipeline_tag: image-classification
+library_name: transformers
 ---
 [**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**](https://arxiv.org/abs/2407.08083).
 ## Model Overview
     <td>224x224</td>
 </tr>
 </table>
 In addition, the MambaVision models demonstrate a strong performance by achieving a new SOTA Pareto-front in
 class="center">
 </p>
 ## Model Usage
 It is highly recommended to install the requirements for MambaVision by running the following:
+Code: https://github.com/NVlabs/MambaVision
 ```Bash
 pip install mambavision
 Given the following image from [COCO dataset](https://cocodataset.org/#home)  val set as an input:
 <p align="center">
 <img src="https://cdn-uploads.huggingface.co/production/uploads/64414b62603214724ebd2636/4duSnqLf4lrNiAHczSmAN.jpeg" width=70% height=70%
 class="center">
 </p>
 The following snippet can be used for image classification:
 ```Python
                              is_training=False,
                              mean=model.config.mean,
                              std=model.config.std,
+                             crop_mode=model.config.crop_pct,
                              crop_pct=model.config.crop_pct)
 inputs = transform(image).unsqueeze(0).cuda()
 # model inference
 print("Size of extracted features in stage 4:", features[3].size()) # torch.Size([1, 640, 7, 7])
 ```
 ### License:
 [NVIDIA Source Code License-NC](https://huggingface.co/nvidia/MambaVision-B-21K/blob/main/LICENSE)