qqc1989 commited on
Commit
2322a62
Β·
verified Β·
1 Parent(s): 96bf1c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +150 -1
README.md CHANGED
@@ -4,6 +4,155 @@ language:
4
  - en
5
  base_model:
6
  - HuggingFaceTB/SmolVLM-256M-Instruct
 
 
7
  tags:
8
  - SmolVLM
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - en
5
  base_model:
6
  - HuggingFaceTB/SmolVLM-256M-Instruct
7
+ - HuggingFaceTB/SmolLM2-135M-Instruct
8
+ - google/siglip-base-patch16-512
9
  tags:
10
  - SmolVLM
11
+ - Int8
12
+ - VLM
13
+ ---
14
+
15
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/SmolVLM_256_banner.png" width="800" height="auto" alt="Image description">
16
+
17
+ # SmolVLM-256M-Instruct-Int8
18
+
19
+ This version of SmolVLM-256M-Instruct has been converted to run on the Axera NPU using **w8a16** quantization.
20
+
21
+ This model has been optimized with the following LoRA:
22
+
23
+ Compatible with Pulsar2 version: 3.3
24
+
25
+ ## Convert tools links:
26
+
27
+ For those who are interested in model conversion, you can try to export axmodel through the original repo :
28
+ https://huggingface.co/HuggingFaceTB/SmolVLM-256M-Instruct
29
+
30
+ [Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html)
31
+
32
+ [AXera NPU HOST LLM Runtime](https://github.com/techshoww/ax-llm)
33
+
34
+
35
+ ## Support Platform
36
+
37
+ - AX650
38
+ - AX650N DEMO Board
39
+ - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
40
+ - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
41
+ - AX630C
42
+ - [爱芯派2](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
43
+ - [Module-LLM](https://docs.m5stack.com/zh_CN/module/Module-LLM)
44
+ - [LLM630 Compute Kit](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)
45
+
46
+ |Chips|image encoder 512|ttft|w8a16|
47
+ |--|--|--|--|
48
+ |AX650| 105 ms | 57 ms |80 tokens/sec|
49
+ |AX630C| 800 ms | 182 ms |31 tokens/sec|
50
+
51
+ ## How to use
52
+
53
+ Download all files from this repository to the device
54
+
55
+ ```
56
+ root@ax650:/mnt/qtang/llm-test/smolvlm-256m # tree -L 1
57
+ .
58
+ β”œβ”€β”€ images -> ../../images/
59
+ β”œβ”€β”€ main
60
+ β”œβ”€β”€ post_config.json
61
+ β”œβ”€β”€ run_smolvlm_ax630c.sh
62
+ β”œβ”€β”€ run_smolvlm_ax650.sh
63
+ β”œβ”€β”€ smolvlm-256m-ax630c
64
+ β”œβ”€β”€ smolvlm-256m-ax650
65
+ β”œβ”€β”€ smolvlm_tokenizer
66
+ β”œβ”€β”€ smolvlm_tokenizer_512.py
67
+ └── ssd_car.jpg
68
+ ```
69
+
70
+ #### Install transformer
71
+
72
+ ```
73
+ pip install transformers==4.41.1
74
+ ```
75
+
76
+ #### Start the Tokenizer service
77
+
78
+ ```
79
+ root@ax650:/mnt/qtang/llm-test/smolvlm-256m# python smolvlm_tokenizer_512.py --port 12345
80
+ Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
81
+ 1 <|im_start|> 49279 <end_of_utterance>
82
+ [1, 11126, 42, 49189, 49152, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190,
83
+ 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190,
84
+ 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190,
85
+ 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190,
86
+ 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49189, 7306, 346, 5125, 451, 2443, 47, 49279,
87
+ 198, 9519, 9531, 42]
88
+ 81
89
+ [1, 11126, 42, 28120, 905, 49279, 198, 9519, 9531, 42]
90
+ 10
91
+ http://localhost:12345
92
+ ```
93
+
94
+ #### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board
95
+
96
+ - input text
97
+
98
+ ```
99
+ Describe the picture
100
+ ```
101
+
102
+ - input image
103
+
104
+ ![](./ssd_car.jpg)
105
+
106
+ Open another terminal and run `./run_smolvlm_ax650.sh`
107
+
108
+ ```
109
+ root@ax650:/mnt/qtang/llm-test/smolvlm-256m# ./run_smolvlm_ax650.sh
110
+ [I][ Init][ 106]: LLM init start
111
+ bos_id: 1, eos_id: 49279
112
+ 2% | β–ˆ | 1 / 34 [0.00s<0.14s, 250.00 count/s] tokenizer init ok
113
+ [I][ Init][ 26]: LLaMaEmbedSelector use mmap
114
+ 100% | β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 34 / 34 [0.67s<0.67s, 50.90 count/s] init vpm axmodel ok,remain_cmm(11698 MB)B)
115
+ [I][ Init][ 254]: max_token_len : 1023
116
+ [I][ Init][ 259]: kv_cache_size : 192, kv_cache_num: 1023
117
+ [I][ Init][ 267]: prefill_token_num : 128
118
+ [I][ Init][ 269]: vpm_height : 512,vpm_width : 512
119
+ [I][ Init][ 279]: LLM init ok
120
+ Type "q" to exit, Ctrl+c to stop current running
121
+ prompt >> Describe the picture
122
+ image >> ./ssd_car.jpg
123
+ [I][ Encode][ 338]: image encode time : 104.691002 ms, size : 36864
124
+ [I][ Run][ 549]: ttft: 58.01 ms
125
+ The image depicts a double decker bus, which is prominently displayed in the center of the image. The bus is red and has a large, bold sign on its roof that reads
126
+ "Things Get More Exciting When You Say So." The sign is in white text, and the bus is designed to be eye-catching and visually appealing.
127
+
128
+ The bus is parked on a city street, with a few other vehicles visible in the background. The street is lined with buildings, including a few shops and restaurants,
129
+ which are partially visible. The buildings are well-lit, and the street is clean and well-maintained.
130
+
131
+ In the foreground, there is a person standing in front of the bus. The person is wearing a dark jacket and appears to be waiting for the bus. The person is facing the bus,
132
+ and they seem to be waiting for the bus to arrive.
133
+
134
+ The bus is parked on the street, and it is not moving. The bus is not moving, and there are no other vehicles visible in the image. The street is well-maintained,
135
+ and the buildings are well-lit, indicating that it is a sunny day.
136
+
137
+ The image is taken from a slightly elevated perspective, which gives a clear view of the bus and the surrounding area. The lighting in the image is bright,
138
+ and the shadows are well-defined, indicating that the sun is shining brightly.
139
+
140
+ To summarize, the image depicts:
141
+ 1. A double-decker bus with a large sign on its roof that reads "Things Get More Exciting When You Say So."
142
+ 2. The bus is parked on a city street with a few other vehicles visible in the background.
143
+ 3. The bus is not moving, and there are no other vehicles visible in the image.
144
+ 4. The street is well-maintained, and the buildings are well-lit, indicating a sunny day.
145
+
146
+ This description provides a comprehensive overview of the image, allowing a text model to answer any questions related to the image based on the description.
147
+
148
+ [N][ Run][ 688]: hit eos,avg 80.54 token/s
149
+
150
+ prompt >> q
151
+ root@ax650:/mnt/qtang/llm-test/smolvlm-256m#
152
+ ```
153
+
154
+ #### Inference with M.2 Accelerator card
155
+
156
+ [What is M.2 Accelerator card?](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html), Show this DEMO based on Raspberry PI 5.
157
+
158
+ *TODO*