qqc1989's picture
Update README.md
d383d62 verified
|
raw
history blame
7.28 kB
metadata
license: mit
language:
  - en
  - zh
base_model:
  - Qwen/Qwen2.5-VL-3B-Instruct
pipeline_tag: image-text-to-text
library_name: transformers
tags:
  - Qwen2.5-VL
  - Qwen2.5-VL-3B-Instruct
  - Int8
  - VLM

Qwen2.5-VL-3B-Instruct

This version of Qwen2.5-VL-3B-Instruct has been converted to run on the Axera NPU using w8a16 quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: 3.4

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo : https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct

Pulsar2 Link, How to Convert LLM from Huggingface to axmodel

AXera NPU HOST LLM Runtime

Support Platform

Image Process

Chips input size image num image encoder ttft(320 tokens) w8a16 DDR Flash
AX650 448*448 1 780 ms 420 ms 6.2 tokens/sec 4.3 GiB 4.6 GiB

Video Process

Chips input size image num image encoder ttft(512 tokens) w8a16 DDR Flash
AX650 308*308 8 1400 ms 5400 ms 6.1 tokens/sec 4.4 GiB 4.7 GiB

How to use

Download all files from this repository to the device

If you using AX650 Board

root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# tree -L 2
.
โ”œโ”€โ”€ image
โ”‚   โ””โ”€โ”€ ssd_car.jpg
โ”œโ”€โ”€ main
โ”œโ”€โ”€ python
โ”‚   โ”œโ”€โ”€ cv_resize.py
โ”‚   โ”œโ”€โ”€ infer_image.py
โ”‚   โ”œโ”€โ”€ infer_text.py
โ”‚   โ”œโ”€โ”€ infer_video.py
โ”‚   โ”œโ”€โ”€ preprocess.py
โ”‚   โ””โ”€โ”€ utils.py
โ”œโ”€โ”€ qwen2_5-vl-3b-image-ax650
โ”‚   โ”œโ”€โ”€ Qwen2.5-VL-3B-Instruct_vision_nchw448.axmodel
โ”‚   โ”œโ”€โ”€ model.embed_tokens.weight.bfloat16.bin
โ”‚   โ”œโ”€โ”€ qwen2_5_vl_p320_l0_together.axmodel
......
โ”‚   โ”œโ”€โ”€ qwen2_5_vl_p320_l9_together.axmodel
โ”‚   โ””โ”€โ”€ qwen2_5_vl_post.axmodel
โ”œโ”€โ”€ qwen2_5-vl-3b-video-ax650
โ”‚   โ”œโ”€โ”€ Qwen2.5-VL-3B-Instruct_vision_nhwc.axmodel
โ”‚   โ”œโ”€โ”€ model.embed_tokens.weight.bfloat16.bin
โ”‚   โ”œโ”€โ”€ qwen2_5_vl_p512_l0_together.axmodel
......
โ”‚   โ”œโ”€โ”€ qwen2_5_vl_p512_l9_together.axmodel
โ”‚   โ””โ”€โ”€ qwen2_5_vl_post.axmodel
โ”œโ”€โ”€ qwen2_5-vl-tokenizer
โ”‚   โ”œโ”€โ”€ chat_template.json
โ”‚   โ”œโ”€โ”€ config.json
โ”‚   โ”œโ”€โ”€ generation_config.json
โ”‚   โ”œโ”€โ”€ merges.txt
โ”‚   โ”œโ”€โ”€ model.safetensors.index.json
โ”‚   โ”œโ”€โ”€ preprocessor_config.json
โ”‚   โ”œโ”€โ”€ tokenizer.json
โ”‚   โ”œโ”€โ”€ tokenizer_config.json
โ”‚   โ””โ”€โ”€ vocab.json
โ”œโ”€โ”€ qwen2_tokenizer_image_448.py
โ”œโ”€โ”€ qwen2_tokenizer_video_308.py
โ”œโ”€โ”€ run_qwen2_5_vl_image.sh
โ”œโ”€โ”€ run_qwen2_5_vl_video.sh
โ””โ”€โ”€ video
    โ”œโ”€โ”€ frame_0075.jpg
......
    โ””โ”€โ”€ frame_0089.jpg

Install transformer

pip install transformers==4.41.1

Start the Tokenizer service

If you using image process

  • input text
ๆ่ฟฐไธ‹ๅ›พ็‰‡
  • input image

root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# ./run_qwen2_5_vl_image.sh
[I][                            Init][ 129]: LLM init start
bos_id: -1, eos_id: 151645
  2% | โ–ˆ                                 |   1 /  40 [0.01s<0.24s, 166.67 count/s] tokenizer init ok
[I][                            Init][  26]: LLaMaEmbedSelector use mmap
100% | โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ |  40 /  40 [38.23s<38.23s, 1.05 count/s] init vpm axmodel ok,remain_cmm(7600 MB)
[I][                            Init][ 277]: max_token_len : 1023
[I][                            Init][ 282]: kv_cache_size : 256, kv_cache_num: 1023
[I][                            Init][ 290]: prefill_token_num : 320
[I][                            Init][ 292]: vpm_height : 1024,vpm_width : 392
[I][                            Init][ 301]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running

prompt >> who are you?
image >>
[I][                             Run][ 638]: ttft: 2854.47 ms
I am a large language model created by Alibaba Cloud. I am called Qwen.

[N][                             Run][ 779]: hit eos,avg 6.05 token/s

prompt >> ๆ่ฟฐไธ‹ๅ›พ็‰‡
image >> image/ssd_car.jpg
[I][                          Encode][ 416]: image encode time : 795.614014 ms, size : 524288
[I][                             Run][ 638]: ttft: 2856.88 ms
่ฟ™ๅผ ๅ›พ็‰‡ๅฑ•็คบไบ†ไธ€ๆก็นๅฟ™็š„ๅŸŽๅธ‚่ก—้“ใ€‚ๅ‰ๆ™ฏไธญ๏ผŒไธ€ๅๅฅณๅญ็ซ™ๅœจไบบ่กŒ้“ไธŠ๏ผŒๅฅน็ฉฟ็€้ป‘่‰ฒๅค–ๅฅ—๏ผŒ้ขๅธฆๅพฎ็ฌ‘ใ€‚ๅฅนๆ—่พนๆ˜ฏไธ€่พ†็บข่‰ฒ็š„ๅŒๅฑ‚ๅทดๅฃซ๏ผŒๅทดๅฃซไธŠๆœ‰ไธ€ไธชๅนฟๅ‘Š๏ผŒ
ไธŠ้ขๅ†™็€โ€œTHINGS GET MORE EXITING WHEN YOU SAY โ€˜YESโ€™โ€ใ€‚ๅทดๅฃซ็š„่ฝฆ็‰Œๅทๆ˜ฏโ€œL15โ€ใ€‚ๅทดๅฃซๆ—่พนๅœ็€ไธ€่พ†้ป‘่‰ฒ็š„ๅฐๅž‹่ดง่ฝฆใ€‚่ƒŒๆ™ฏไธญๅฏไปฅ็œ‹ๅˆฐไธ€ไบ›ๅ•†ๅบ—ๅ’Œ่กŒไบบ๏ผŒ
่ก—้“ไธคๆ—็š„ๅปบ็ญ‘็‰ฉๆ˜ฏ็Žฐไปฃ็š„็Žป็’ƒๅน•ๅข™ๅปบ็ญ‘ใ€‚ๆ•ดไฝ“ๆฐ›ๅ›ดๆ˜พๅพ—็นๅฟ™่€Œๅ……ๆปกๆดปๅŠ›ใ€‚

[N][                             Run][ 779]: hit eos,avg 5.96 token/s

If you using video process

root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# ./run_qwen2_5_vl_video.sh
[I][                            Init][ 129]: LLM init start
bos_id: -1, eos_id: 151645
  2% | โ–ˆ                                 |   1 /  40 [0.00s<0.12s, 333.33 count/s] tokenizer init ok
[I][                            Init][  26]: LLaMaEmbedSelector use mmap
100% | โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ |  40 /  40 [40.05s<40.05s, 1.00 count/s] init vpm axmodel ok,remain_cmm(7680 MB)
[I][                            Init][ 277]: max_token_len : 1023
[I][                            Init][ 282]: kv_cache_size : 256, kv_cache_num: 1023
[I][                            Init][ 290]: prefill_token_num : 512
[I][                            Init][ 292]: vpm_height : 484,vpm_width : 392
[I][                            Init][ 301]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running

prompt >> ๆ่ฟฐ่ฟ™ไธช่ง†้ข‘
image >> video
video/frame_0075.jpg
video/frame_0077.jpg
video/frame_0079.jpg
video/frame_0081.jpg
video/frame_0083.jpg
video/frame_0085.jpg
video/frame_0087.jpg
video/frame_0089.jpg
[I][                          Encode][ 416]: image encode time : 1488.392944 ms, size : 991232
[I][                             Run][ 638]: ttft: 5487.22 ms
่ง†้ข‘ๆ˜พ็คบ็š„ๆ˜ฏไธ€ไธชๅŸŽๅธ‚่ก—้“็š„ๅœบๆ™ฏใ€‚ๆ—ถ้—ดๆˆณๆ˜พ็คบไธบ2ๆœˆ26ๆ—ฅ๏ผŒๅœฐ็‚นๆ˜ฏxxxใ€‚่ง†้ข‘ไธญ๏ผŒไธ€ๅ็ฉฟ็€ๆทฑ่‰ฒๅค–ๅฅ—ๅ’Œ็‰›ไป”่ฃค็š„็”ทๅญๆญฃๅœจๆŽจ็€ไธ€ไธช่กŒๆŽ็ฎฑใ€‚
็ช็„ถ๏ผŒไป–ไผผไนŽ่ขซไป€ไนˆไธœ่ฅฟ็ปŠๅ€’๏ผŒ้šๅŽไป–ๆ‘”ๅ€’ๅœจๅœฐใ€‚่ƒŒๆ™ฏไธญๅฏไปฅ็œ‹ๅˆฐไธ€ไธชๅนฟๅ‘Š็‰Œ๏ผŒไธŠ้ขๆœ‰ไธ€ไธช็ปฟ่‰ฒ็š„ๅ›พๆกˆ๏ผŒๆ—่พนๅœ็€ไธ€่พ†็”ตๅŠจ่ฝฆใ€‚่ก—้“ไธคๆ—ๆœ‰ๅปบ็ญ‘็‰ฉๅ’Œๆ ‘ๆœจ๏ผŒๅคฉๆฐ”็œ‹่ตทๆฅๆœ‰ไบ›้˜ดๆฒ‰ใ€‚

[N][                             Run][ 779]: hit eos,avg 5.94 token/s

Inference with M.2 Accelerator card

What is M.2 Accelerator card?, Show this DEMO based on Raspberry PI 5.

TODO