Update model card: set pipeline tag, add library name and citation

This PR fixes the pipeline tag, setting it to `image-classification` as that is the main usecase of this model. It also adds the `library_name` tag, so that the "how to use" button appears on the top right of the model page. Finally, citation information and star history are added for better discoverability.

Files changed (1) hide show

README.md +51 -21

README.md CHANGED Viewed

@@ -1,13 +1,13 @@
 ---
 license: other
 license_name: nvclv1
 license_link: LICENSE
-datasets:
-- ILSVRC/imagenet-1k
-pipeline_tag: image-feature-extraction
 ---
 [**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**](https://arxiv.org/abs/2407.08083).
 ## Model Overview
@@ -17,38 +17,34 @@ We have developed the first hybrid model for computer vision which leverages the
 ## Model Performance
 MambaVision demonstrates a strong performance by achieving a new SOTA Pareto-front in
-terms of Top-1 accuracy and throughput.
 <p align="center">
-<img src="https://github.com/NVlabs/MambaVision/assets/26806394/79dcf841-3966-4b77-883d-76cd5e1d4320" width=70% height=70%
 class="center">
 </p>
 ## Model Usage
 It is highly recommended to install the requirements for MambaVision by running the following:
 ```Bash
 pip install mambavision
 ```
-For each model, we offer two variants for image classification and feature extraction that can be imported with 1 line of code.
 ### Image Classification
-In the following example, we demonstrate how MambaVision can be used for image classification.
-Given the following image from [COCO dataset](https://cocodataset.org/#home)  val set as an input:
 <p align="center">
-<img src="https://cdn-uploads.huggingface.co/production/uploads/64414b62603214724ebd2636/4duSnqLf4lrNiAHczSmAN.jpeg" width=70% height=70%
 class="center">
 </p>
 The following snippet can be used for image classification:
 ```Python
@@ -77,18 +73,18 @@ transform = create_transform(input_size=input_resolution,
 inputs = transform(image).unsqueeze(0).cuda()
 # model inference
 outputs = model(inputs)
-logits = outputs['logits']
 predicted_class_idx = logits.argmax(-1).item()
 print("Predicted class:", model.config.id2label[predicted_class_idx])
 ```
-The predicted label is ```brown bear, bruin, Ursus arctos.```
 ### Feature Extraction
-MambaVision can also be used as a generic feature extractor.
-Specifically, we can extract the outputs of each stage of model (4 stages) as well as the final averaged-pool features that are flattened.
 The following snippet can be used for feature extraction:
@@ -98,7 +94,7 @@ from PIL import Image
 from timm.data.transforms_factory import create_transform
 import requests
-model = AutoModel.from_pretrained("nvidia/MambaVision-T2-1K", trust_remote_code=True)
 # eval mode for inference
 model.cuda().eval()
@@ -123,7 +119,41 @@ print("Size of extracted features in stage 1:", features[0].size()) # torch.Size
 print("Size of extracted features in stage 4:", features[3].size()) # torch.Size([1, 640, 7, 7])
 ```
-### License:
-[NVIDIA Source Code License-NC](https://huggingface.co/nvidia/MambaVision-T-1K/blob/main/LICENSE)

 ---
+datasets:
+- ILSVRC/imagenet-1k
 license: other
 license_name: nvclv1
 license_link: LICENSE
+pipeline_tag: image-classification
+library_name: transformers
 ---
 [**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**](https://arxiv.org/abs/2407.08083).
 ## Model Overview
 ## Model Performance
 MambaVision demonstrates a strong performance by achieving a new SOTA Pareto-front in
+terms of Top-1 accuracy and throughput.
 <p align="center">
+<img src="https://github.com/NVlabs/MambaVision/assets/26806394/79dcf841-3966-4b77-883d-76cd5e1d4320" width=70% height=70%
 class="center">
 </p>
 ## Model Usage
 It is highly recommended to install the requirements for MambaVision by running the following:
 ```Bash
 pip install mambavision
 ```
+For each model, we offer two variants for image classification and feature extraction that can be imported with 1 line of code.
 ### Image Classification
+In the following example, we demonstrate how MambaVision can be used for image classification.
+Given the following image from [COCO dataset](https://cocodataset.org/#home) val set as an input:
 <p align="center">
+<img src="https://cdn-uploads.huggingface.co/production/uploads/64414b62603214724ebd2636/4duSnqLf4lrNiAHczSmAN.jpeg" width=70% height=70%
 class="center">
 </p>
 The following snippet can be used for image classification:
 ```Python
 inputs = transform(image).unsqueeze(0).cuda()
 # model inference
 outputs = model(inputs)
+logits = outputs['logits']
 predicted_class_idx = logits.argmax(-1).item()
 print("Predicted class:", model.config.id2label[predicted_class_idx])
 ```
+The predicted label is `brown bear, bruin, Ursus arctos.`
 ### Feature Extraction
+MambaVision can also be used as a generic feature extractor.
+Specifically, we can extract the outputs of each stage of model (4 stages) as well as the final averaged-pool features that are flattened.
 The following snippet can be used for feature extraction:
 from timm.data.transforms_factory import create_transform
 import requests
+model = AutoModel.from_pretrained("nvidia/MambaVision-T-1K", trust_remote_code=True)
 # eval mode for inference
 model.cuda().eval()
 print("Size of extracted features in stage 4:", features[3].size()) # torch.Size([1, 640, 7, 7])
 ```
+### License:
+[NVIDIA Source Code License-NC](https://huggingface.co/nvidia/MambaVision-T-1K/blob/main/LICENSE)
+## Citation
+If you find MambaVision to be useful for your work, please consider citing our paper:
+```
+@article{hatamizadeh2024mambavision,
+  title={MambaVision: A Hybrid Mamba-Transformer Vision Backbone},
+  author={Hatamizadeh, Ali and Kautz, Jan},
+  journal={arXiv preprint arXiv:2407.08083},
+  year={2024}
+}
+```
+## Star History
+[![Stargazers repo roster for @NVlabs/MambaVision](https://bytecrank.com/nastyox/reporoster/php/stargazersSVG.php?user=NVlabs&repo=MambaVision)](https://github.com/NVlabs/MambaVision/stargazers)
+[![Star History Chart](https://api.star-history.com/svg?repos=NVlabs/MambaVision&type=Date)](https://star-history.com/#NVlabs/MambaVision&Date)
+## Licenses
+Copyright © 2025, NVIDIA Corporation. All rights reserved.
+This work is made available under the NVIDIA Source Code License-NC. Click [here](LICENSE) to view a copy of this license.
+The pre-trained models are shared under [CC-BY-NC-SA-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
+For license information regarding the timm repository, please refer to its [repository](https://github.com/rwightman/pytorch-image-models).
+For license information regarding the ImageNet dataset, please see the [ImageNet official website](https://www.image-net.org/).
+## Acknowledgement
+This repository is built on top of the [timm](https://github.com/huggingface/pytorch-image-models) repository. We thank [Ross Wrightman](https://rwightman.com/) for creating and maintaining this high-quality library.