metadata
license: mit
language:
- en
- zh
base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- Qwen2.5-VL
- Qwen2.5-VL-3B-Instruct
- Int8
- VLM
Qwen2.5-VL-3B-Instruct
This version of Qwen2.5-VL-3B-Instruct has been converted to run on the Axera NPU using w8a16 quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: 3.4
Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through the original repo : https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct
Pulsar2 Link, How to Convert LLM from Huggingface to axmodel
Support Platform
- AX650
- AX650N DEMO Board
- M4N-Dock(็ฑ่ฏๆดพPro)
- M.2 Accelerator card
Image Process
| Chips | input size | image num | image encoder | ttft(320 tokens) | w8a16 | DDR | Flash |
|---|---|---|---|---|---|---|---|
| AX650 | 448*448 | 1 | 780 ms | 420 ms | 6.2 tokens/sec | 4.3 GiB | 4.6 GiB |
Video Process
| Chips | input size | image num | image encoder | ttft(512 tokens) | w8a16 | DDR | Flash |
|---|---|---|---|---|---|---|---|
| AX650 | 308*308 | 8 | 1400 ms | 5400 ms | 6.1 tokens/sec | 4.4 GiB | 4.7 GiB |
How to use
Download all files from this repository to the device
If you using AX650 Board
root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# tree -L 2
.
โโโ image
โ โโโ ssd_car.jpg
โโโ main
โโโ python
โ โโโ cv_resize.py
โ โโโ infer_image.py
โ โโโ infer_text.py
โ โโโ infer_video.py
โ โโโ preprocess.py
โ โโโ utils.py
โโโ qwen2_5-vl-3b-image-ax650
โ โโโ Qwen2.5-VL-3B-Instruct_vision_nchw448.axmodel
โ โโโ model.embed_tokens.weight.bfloat16.bin
โ โโโ qwen2_5_vl_p320_l0_together.axmodel
......
โ โโโ qwen2_5_vl_p320_l9_together.axmodel
โ โโโ qwen2_5_vl_post.axmodel
โโโ qwen2_5-vl-3b-video-ax650
โ โโโ Qwen2.5-VL-3B-Instruct_vision_nhwc.axmodel
โ โโโ model.embed_tokens.weight.bfloat16.bin
โ โโโ qwen2_5_vl_p512_l0_together.axmodel
......
โ โโโ qwen2_5_vl_p512_l9_together.axmodel
โ โโโ qwen2_5_vl_post.axmodel
โโโ qwen2_5-vl-tokenizer
โ โโโ chat_template.json
โ โโโ config.json
โ โโโ generation_config.json
โ โโโ merges.txt
โ โโโ model.safetensors.index.json
โ โโโ preprocessor_config.json
โ โโโ tokenizer.json
โ โโโ tokenizer_config.json
โ โโโ vocab.json
โโโ qwen2_tokenizer_image_448.py
โโโ qwen2_tokenizer_video_308.py
โโโ run_qwen2_5_vl_image.sh
โโโ run_qwen2_5_vl_video.sh
โโโ video
โโโ frame_0075.jpg
......
โโโ frame_0089.jpg
Install transformer
pip install transformers==4.41.1
Start the Tokenizer service
If you using image process
- input text
ๆ่ฟฐไธๅพ็
- input image
root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# ./run_qwen2_5_vl_image.sh
[I][ Init][ 129]: LLM init start
bos_id: -1, eos_id: 151645
2% | โ | 1 / 40 [0.01s<0.24s, 166.67 count/s] tokenizer init ok
[I][ Init][ 26]: LLaMaEmbedSelector use mmap
100% | โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | 40 / 40 [38.23s<38.23s, 1.05 count/s] init vpm axmodel ok,remain_cmm(7600 MB)
[I][ Init][ 277]: max_token_len : 1023
[I][ Init][ 282]: kv_cache_size : 256, kv_cache_num: 1023
[I][ Init][ 290]: prefill_token_num : 320
[I][ Init][ 292]: vpm_height : 1024,vpm_width : 392
[I][ Init][ 301]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
prompt >> who are you?
image >>
[I][ Run][ 638]: ttft: 2854.47 ms
I am a large language model created by Alibaba Cloud. I am called Qwen.
[N][ Run][ 779]: hit eos,avg 6.05 token/s
prompt >> ๆ่ฟฐไธๅพ็
image >> image/ssd_car.jpg
[I][ Encode][ 416]: image encode time : 795.614014 ms, size : 524288
[I][ Run][ 638]: ttft: 2856.88 ms
่ฟๅผ ๅพ็ๅฑ็คบไบไธๆก็นๅฟ็ๅๅธ่ก้ใๅๆฏไธญ๏ผไธๅๅฅณๅญ็ซๅจไบบ่ก้ไธ๏ผๅฅน็ฉฟ็้ป่ฒๅคๅฅ๏ผ้ขๅธฆๅพฎ็ฌใๅฅนๆ่พนๆฏไธ่พ็บข่ฒ็ๅๅฑๅทดๅฃซ๏ผๅทดๅฃซไธๆไธไธชๅนฟๅ๏ผ
ไธ้ขๅ็โTHINGS GET MORE EXITING WHEN YOU SAY โYESโโใๅทดๅฃซ็่ฝฆ็ๅทๆฏโL15โใๅทดๅฃซๆ่พนๅ็ไธ่พ้ป่ฒ็ๅฐๅ่ดง่ฝฆใ่ๆฏไธญๅฏไปฅ็ๅฐไธไบๅๅบๅ่กไบบ๏ผ
่ก้ไธคๆ็ๅปบ็ญ็ฉๆฏ็ฐไปฃ็็ป็ๅนๅขๅปบ็ญใๆดไฝๆฐๅดๆพๅพ็นๅฟ่ๅ
ๆปกๆดปๅใ
[N][ Run][ 779]: hit eos,avg 5.96 token/s
If you using video process
root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# ./run_qwen2_5_vl_video.sh
[I][ Init][ 129]: LLM init start
bos_id: -1, eos_id: 151645
2% | โ | 1 / 40 [0.00s<0.12s, 333.33 count/s] tokenizer init ok
[I][ Init][ 26]: LLaMaEmbedSelector use mmap
100% | โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | 40 / 40 [40.05s<40.05s, 1.00 count/s] init vpm axmodel ok,remain_cmm(7680 MB)
[I][ Init][ 277]: max_token_len : 1023
[I][ Init][ 282]: kv_cache_size : 256, kv_cache_num: 1023
[I][ Init][ 290]: prefill_token_num : 512
[I][ Init][ 292]: vpm_height : 484,vpm_width : 392
[I][ Init][ 301]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
prompt >> ๆ่ฟฐ่ฟไธช่ง้ข
image >> video
video/frame_0075.jpg
video/frame_0077.jpg
video/frame_0079.jpg
video/frame_0081.jpg
video/frame_0083.jpg
video/frame_0085.jpg
video/frame_0087.jpg
video/frame_0089.jpg
[I][ Encode][ 416]: image encode time : 1488.392944 ms, size : 991232
[I][ Run][ 638]: ttft: 5487.22 ms
่ง้ขๆพ็คบ็ๆฏไธไธชๅๅธ่ก้็ๅบๆฏใๆถ้ดๆณๆพ็คบไธบ2ๆ26ๆฅ๏ผๅฐ็นๆฏxxxใ่ง้ขไธญ๏ผไธๅ็ฉฟ็ๆทฑ่ฒๅคๅฅๅ็ไป่ฃค็็ทๅญๆญฃๅจๆจ็ไธไธช่กๆ็ฎฑใ
็ช็ถ๏ผไปไผผไน่ขซไปไนไธ่ฅฟ็ปๅ๏ผ้ๅไปๆๅๅจๅฐใ่ๆฏไธญๅฏไปฅ็ๅฐไธไธชๅนฟๅ็๏ผไธ้ขๆไธไธช็ปฟ่ฒ็ๅพๆก๏ผๆ่พนๅ็ไธ่พ็ตๅจ่ฝฆใ่ก้ไธคๆๆๅปบ็ญ็ฉๅๆ ๆจ๏ผๅคฉๆฐ็่ตทๆฅๆไบ้ดๆฒใ
[N][ Run][ 779]: hit eos,avg 5.94 token/s
Inference with M.2 Accelerator card
What is M.2 Accelerator card?, Show this DEMO based on Raspberry PI 5.
TODO
