File size: 1,961 Bytes

6912dd4
c4a13a0
 
6912dd4
c4a13a0
 
 
 
6912dd4
 
 
c4a13a0
6912dd4
c4a13a0
 
6912dd4
 
c4a13a0
 
6912dd4
 
 
 
 
 
c4a13a0
 
 
 
 
6912dd4
c4a13a0
 
6912dd4
c4a13a0
 
 
 
 
 
 
 
6912dd4
c4a13a0
 
6912dd4
c4a13a0
 
6912dd4
c4a13a0
 
 
6912dd4
c4a13a0
6912dd4
c4a13a0
6912dd4
c4a13a0
 
 
 
 
 
 
 
6912dd4

---
license: mit
pipeline_tag: image-classification
library_name: transformers
tags:
- PyTorch
- Mamba
- SSM
---


# VMamba: Visual State Space Model

VMamba is a bidirectional state-space model finetuned on Imagenet dataset. It was introduced in the paper: 
[VMamba: Visual State Space Model](https://arxiv.org/pdf/2401.10166) and was first released in [this repo](https://github.com/MzeroMiko/VMamba/tree/main).


Disclaimer: This is not the official implementation, please refer to the [official repo](https://github.com/MzeroMiko/VMamba/tree/main). 
This is work is progress to add VMamba backbone for Image, Audio Classification tasks by me, Saurabhchand Bhati.


## How to Get Started with the Model

Use the code below to get started with the model.

```python
import torch
from PIL import Image
import torchvision.transforms as T
from transformers import AutoConfig, AutoModelForImageClassification

config = AutoConfig.from_pretrained('saurabhati/VMamba_ImageNet_82.6',trust_remote_code=True)
vmamba_model = AutoModelForImageClassification.from_pretrained('saurabhati/VMamba_ImageNet_82.6',trust_remote_code=True)

preprocess = T.Compose([
    T.Resize(224, interpolation=Image.BICUBIC),
    T.CenterCrop(224),
    T.ToTensor(),
    T.Normalize(
        mean=[0.4850, 0.4560, 0.4060],
        std=[0.2290, 0.2240, 0.2250]
    )])

input_image = Image.open('/data/sls/scratch/sbhati/data/Imagenet/train/n02009912/n02009912_16160.JPEG')
input_image = preprocess(input_image)

with torch.no_grad():
    logits = vmamba_model(input_image.unsqueeze(0)).logits

predicted_label = vmamba_model.config.id2label[logits.argmax().item()]
predicted_label
'crane'

```

## Citation

```bibtex
@article{liu2024vmamba,
  title={VMamba: Visual State Space Model},
  author={Liu, Yue and Tian, Yunjie and Zhao, Yuzhong and Yu, Hongtian and Xie, Lingxi and Wang, Yaowei and Ye, Qixiang and Liu, Yunfan},
  journal={arXiv preprint arXiv:2401.10166},
  year={2024}
}
```