File size: 7,276 Bytes
d383d62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
---
license: mit
language:
- en
- zh
base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- Qwen2.5-VL
- Qwen2.5-VL-3B-Instruct
- Int8
- VLM
---

# Qwen2.5-VL-3B-Instruct

This version of Qwen2.5-VL-3B-Instruct has been converted to run on the Axera NPU using **w8a16** quantization.

This model has been optimized with the following LoRA: 

Compatible with Pulsar2 version: 3.4

## Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo : 
https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct

[Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html) 

[AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/Qwen2.5-VL-3B-Instruct.axera) 


## Support Platform

- AX650
  - AX650N DEMO Board
  - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
  - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)

**Image Process**
|Chips| input size | image num | image encoder | ttft(320 tokens) | w8a16 | DDR | Flash |
|--|--|--|--|--|--|--|--|
|AX650| 448*448 | 1 | 780 ms | 420 ms | 6.2 tokens/sec| 4.3 GiB |  4.6 GiB  |

**Video Process**
|Chips| input size | image num | image encoder |ttft(512 tokens) | w8a16 | DDR | Flash |
|--|--|--|--|--|--|--|--|
|AX650| 308*308 | 8  | 1400 ms | 5400 ms | 6.1 tokens/sec| 4.4 GiB |  4.7 GiB  | 


## How to use

Download all files from this repository to the device

**If you using AX650 Board**
```
root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# tree -L 2
.
├── image
│   └── ssd_car.jpg
├── main
├── python
│   ├── cv_resize.py
│   ├── infer_image.py
│   ├── infer_text.py
│   ├── infer_video.py
│   ├── preprocess.py
│   └── utils.py
├── qwen2_5-vl-3b-image-ax650
│   ├── Qwen2.5-VL-3B-Instruct_vision_nchw448.axmodel
│   ├── model.embed_tokens.weight.bfloat16.bin
│   ├── qwen2_5_vl_p320_l0_together.axmodel
......
│   ├── qwen2_5_vl_p320_l9_together.axmodel
│   └── qwen2_5_vl_post.axmodel
├── qwen2_5-vl-3b-video-ax650
│   ├── Qwen2.5-VL-3B-Instruct_vision_nhwc.axmodel
│   ├── model.embed_tokens.weight.bfloat16.bin
│   ├── qwen2_5_vl_p512_l0_together.axmodel
......
│   ├── qwen2_5_vl_p512_l9_together.axmodel
│   └── qwen2_5_vl_post.axmodel
├── qwen2_5-vl-tokenizer
│   ├── chat_template.json
│   ├── config.json
│   ├── generation_config.json
│   ├── merges.txt
│   ├── model.safetensors.index.json
│   ├── preprocessor_config.json
│   ├── tokenizer.json
│   ├── tokenizer_config.json
│   └── vocab.json
├── qwen2_tokenizer_image_448.py
├── qwen2_tokenizer_video_308.py
├── run_qwen2_5_vl_image.sh
├── run_qwen2_5_vl_video.sh
└── video
    ├── frame_0075.jpg
......
    └── frame_0089.jpg

```

#### Install transformer

```
pip install transformers==4.41.1
```

#### Start the Tokenizer service

**If you using image process**

- input text

```
描述下图片
```

- input image

![](./image/ssd_car.jpg)

```
root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# ./run_qwen2_5_vl_image.sh
[I][                            Init][ 129]: LLM init start
bos_id: -1, eos_id: 151645
  2% | █                                 |   1 /  40 [0.01s<0.24s, 166.67 count/s] tokenizer init ok
[I][                            Init][  26]: LLaMaEmbedSelector use mmap
100% | ████████████████████████████████ |  40 /  40 [38.23s<38.23s, 1.05 count/s] init vpm axmodel ok,remain_cmm(7600 MB)
[I][                            Init][ 277]: max_token_len : 1023
[I][                            Init][ 282]: kv_cache_size : 256, kv_cache_num: 1023
[I][                            Init][ 290]: prefill_token_num : 320
[I][                            Init][ 292]: vpm_height : 1024,vpm_width : 392
[I][                            Init][ 301]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running

prompt >> who are you?
image >>
[I][                             Run][ 638]: ttft: 2854.47 ms
I am a large language model created by Alibaba Cloud. I am called Qwen.

[N][                             Run][ 779]: hit eos,avg 6.05 token/s

prompt >> 描述下图片
image >> image/ssd_car.jpg
[I][                          Encode][ 416]: image encode time : 795.614014 ms, size : 524288
[I][                             Run][ 638]: ttft: 2856.88 ms
这张图片展示了一条繁忙的城市街道。前景中,一名女子站在人行道上,她穿着黑色外套,面带微笑。她旁边是一辆红色的双层巴士,巴士上有一个广告,
上面写着“THINGS GET MORE EXITING WHEN YOU SAY ‘YES’”。巴士的车牌号是“L15”。巴士旁边停着一辆黑色的小型货车。背景中可以看到一些商店和行人,
街道两旁的建筑物是现代的玻璃幕墙建筑。整体氛围显得繁忙而充满活力。

[N][                             Run][ 779]: hit eos,avg 5.96 token/s
```

**If you using video process**

```
root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# ./run_qwen2_5_vl_video.sh
[I][                            Init][ 129]: LLM init start
bos_id: -1, eos_id: 151645
  2% | █                                 |   1 /  40 [0.00s<0.12s, 333.33 count/s] tokenizer init ok
[I][                            Init][  26]: LLaMaEmbedSelector use mmap
100% | ████████████████████████████████ |  40 /  40 [40.05s<40.05s, 1.00 count/s] init vpm axmodel ok,remain_cmm(7680 MB)
[I][                            Init][ 277]: max_token_len : 1023
[I][                            Init][ 282]: kv_cache_size : 256, kv_cache_num: 1023
[I][                            Init][ 290]: prefill_token_num : 512
[I][                            Init][ 292]: vpm_height : 484,vpm_width : 392
[I][                            Init][ 301]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running

prompt >> 描述这个视频
image >> video
video/frame_0075.jpg
video/frame_0077.jpg
video/frame_0079.jpg
video/frame_0081.jpg
video/frame_0083.jpg
video/frame_0085.jpg
video/frame_0087.jpg
video/frame_0089.jpg
[I][                          Encode][ 416]: image encode time : 1488.392944 ms, size : 991232
[I][                             Run][ 638]: ttft: 5487.22 ms
视频显示的是一个城市街道的场景。时间戳显示为2月26日,地点是xxx。视频中,一名穿着深色外套和牛仔裤的男子正在推着一个行李箱。
突然,他似乎被什么东西绊倒,随后他摔倒在地。背景中可以看到一个广告牌,上面有一个绿色的图案,旁边停着一辆电动车。街道两旁有建筑物和树木,天气看起来有些阴沉。

[N][                             Run][ 779]: hit eos,avg 5.94 token/s
```

#### Inference with M.2 Accelerator card
What is M.2 Accelerator card?, Show this DEMO based on Raspberry PI 5.

TODO