Update pipeline tag, add library name, and link to code
Browse filesThis PR updates the `pipeline_tag` to `image-classification` to better reflect the model's primary use case. It also adds the `library_name` as `transformers` since the Hugging Face Transformers library is used in the provided examples. It also ensures that it is easier for users to find the official code repository: https://github.com/NVlabs/MambaVision.
README.md
CHANGED
@@ -1,15 +1,17 @@
|
|
1 |
---
|
|
|
|
|
2 |
license: other
|
3 |
license_name: nvclv1
|
4 |
license_link: LICENSE
|
5 |
-
|
6 |
-
|
7 |
-
pipeline_tag: image-feature-extraction
|
8 |
---
|
9 |
|
10 |
-
|
11 |
[**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**](https://arxiv.org/abs/2407.08083).
|
12 |
|
|
|
|
|
13 |
## Model Overview
|
14 |
|
15 |
We have developed the first hybrid model for computer vision which leverages the strengths of Mamba and Transformers. Specifically, our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features. In addition, we conducted a comprehensive ablation study on the feasibility of integrating Vision Transformers (ViT) with Mamba. Our results demonstrate that equipping the Mamba architecture with several self-attention blocks at the final layers greatly improves the modeling capacity to capture long-range spatial dependencies. Based on our findings, we introduce a family of MambaVision models with a hierarchical architecture to meet various design criteria.
|
|
|
1 |
---
|
2 |
+
datasets:
|
3 |
+
- ILSVRC/imagenet-1k
|
4 |
license: other
|
5 |
license_name: nvclv1
|
6 |
license_link: LICENSE
|
7 |
+
pipeline_tag: image-classification
|
8 |
+
library_name: transformers
|
|
|
9 |
---
|
10 |
|
|
|
11 |
[**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**](https://arxiv.org/abs/2407.08083).
|
12 |
|
13 |
+
Code: https://github.com/NVlabs/MambaVision
|
14 |
+
|
15 |
## Model Overview
|
16 |
|
17 |
We have developed the first hybrid model for computer vision which leverages the strengths of Mamba and Transformers. Specifically, our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features. In addition, we conducted a comprehensive ablation study on the feasibility of integrating Vision Transformers (ViT) with Mamba. Our results demonstrate that equipping the Mamba architecture with several self-attention blocks at the final layers greatly improves the modeling capacity to capture long-range spatial dependencies. Based on our findings, we introduce a family of MambaVision models with a hierarchical architecture to meet various design criteria.
|