Upload 6 files
Browse files- README.md +285 -3
- convert_vision_encoder.py +73 -0
- export_vision_onnx.py +97 -0
- rkllm-convert.py +1 -1
- run_rkllm.py +10 -4
- vision_encoder.rknn +2 -2
README.md
CHANGED
|
@@ -1,7 +1,289 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
| 4 |
|
| 5 |
-
(
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- Qwen/Qwen2.5-VL-3B-Instruct
|
| 4 |
+
tags:
|
| 5 |
+
- rknn
|
| 6 |
+
- rkllm
|
| 7 |
---
|
| 8 |
+
# Qwen2.5-VL-3B-Instruct-RKLLM
|
| 9 |
|
| 10 |
+
## (English README see below)
|
| 11 |
|
| 12 |
+
在RK3588上运行强大的Qwen2.5-VL-3B-Instruct-RKLLM视觉大模型!
|
| 13 |
+
|
| 14 |
+
- 推理速度(RK3588): 视觉编码器 3.4s(三核并行) + LLM 填充 2.3s (320 tokens / 138 tps) + 解码 8.2 tps
|
| 15 |
+
- 内存占用(RK3588, 上下文长度1024): 6.1GB
|
| 16 |
+
|
| 17 |
+
## 使用方法
|
| 18 |
+
|
| 19 |
+
1. 克隆或者下载此仓库到本地. 模型较大, 请确保有足够的磁盘空间.
|
| 20 |
+
|
| 21 |
+
2. 开发板的RKNPU2内核驱动版本必须>=0.9.6才能运行这么大的模型.
|
| 22 |
+
使用root权限运行以下命令检查驱动版本:
|
| 23 |
+
```bash
|
| 24 |
+
> cat /sys/kernel/debug/rknpu/version
|
| 25 |
+
RKNPU driver: v0.9.8
|
| 26 |
+
```
|
| 27 |
+
如果版本过低, 请更新驱动. 你可能需要更新内核, 或查找官方文档以获取帮助.
|
| 28 |
+
|
| 29 |
+
3. 安装依赖
|
| 30 |
+
|
| 31 |
+
```bash
|
| 32 |
+
pip install "numpy<2" opencv-python rknn-toolkit-lite2
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
4. 运行
|
| 36 |
+
|
| 37 |
+
```bash
|
| 38 |
+
python ./run_rkllm.py ./test.jpg ./vision_encoder.rknn ./language_model_w8a8.rkllm 512 1024 3
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
参数说明:
|
| 42 |
+
- `512`: max_new_tokens, 最大生成token数.
|
| 43 |
+
- `1024`: max_context_len, 最大上下文长度.
|
| 44 |
+
- `3`: npu_core_num, 使用的NPU核心数.
|
| 45 |
+
|
| 46 |
+
如果实测性能不理想, 可以调整CPU调度器让CPU始终运行在最高频率, 并把推理程序绑定到大核(`taskset -c 4-7 python ...`)
|
| 47 |
+
|
| 48 |
+
test.jpg:
|
| 49 |
+

|
| 50 |
+
|
| 51 |
+
```
|
| 52 |
+
Initializing ONNX Runtime for vision encoder...
|
| 53 |
+
W rknn-toolkit-lite2 version: 2.3.2
|
| 54 |
+
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
|
| 55 |
+
Vision encoder loaded successfully.
|
| 56 |
+
ONNX Input: pixel_values, ONNX Output: vision_features
|
| 57 |
+
Initializing RKLLM Runtime...
|
| 58 |
+
I rkllm: rkllm-runtime version: 1.2.1, rknpu driver version: 0.9.8, platform: RK3588
|
| 59 |
+
I rkllm: loading rkllm model from ./language_model_w8a8.rkllm
|
| 60 |
+
I rkllm: rkllm-toolkit version: 1.2.1, max_context_limit: 4096, npu_core_num: 3, target_platform: RK3588, model_dtype: W8A8
|
| 61 |
+
I rkllm: Enabled cpus: [4, 5, 6, 7]
|
| 62 |
+
I rkllm: Enabled cpus num: 4
|
| 63 |
+
I rkllm: Using mrope
|
| 64 |
+
RKLLM initialized successfully.
|
| 65 |
+
Preprocessing image...
|
| 66 |
+
Running vision encoder...
|
| 67 |
+
W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.
|
| 68 |
+
视觉编码器推理耗时: 3.5427 秒
|
| 69 |
+
Image encoded successfully.
|
| 70 |
+
I rkllm: reset chat template:
|
| 71 |
+
I rkllm: system_prompt: <|im_start|>system\nYou are a helpful assistant.<|im_end|>\n
|
| 72 |
+
I rkllm: prompt_prefix: <|im_start|>user\n
|
| 73 |
+
I rkllm: prompt_postfix: <|im_end|>\n<|im_start|>assistant\n
|
| 74 |
+
W rkllm: Calling rkllm_set_chat_template will disable the internal automatic chat template parsing, including enable_thinking. Make sure your custom prompt is complete and valid.
|
| 75 |
+
|
| 76 |
+
**********************可输入以下问题对应序号获取回答/或自定义输入********************
|
| 77 |
+
|
| 78 |
+
[0] Picture 1: <image> What is in the image?
|
| 79 |
+
[1] Picture 1: <image> 这张图片中有什么?
|
| 80 |
+
|
| 81 |
+
*************************************************************************
|
| 82 |
+
|
| 83 |
+
|
| 84 |
+
user: 0
|
| 85 |
+
Picture 1: <image> What is in the image?
|
| 86 |
+
robot: n_image_tokens: 289
|
| 87 |
+
The image shows a cozy bedroom with several notable features:
|
| 88 |
+
|
| 89 |
+
- A large bed covered with a blue comforter.
|
| 90 |
+
- A wooden dresser next to the bed, topped with various items including a mirror and some decorative objects.
|
| 91 |
+
- A window allowing natural light into the room, offering a view of greenery outside.
|
| 92 |
+
- A bookshelf filled with numerous books on shelves.
|
| 93 |
+
- A basket placed near the foot of the bed.
|
| 94 |
+
- A lamp on a side table beside the bed.
|
| 95 |
+
|
| 96 |
+
The overall ambiance is warm and inviting.
|
| 97 |
+
|
| 98 |
+
I rkllm: --------------------------------------------------------------------------------------
|
| 99 |
+
I rkllm: Model init time (ms) 3361.48
|
| 100 |
+
I rkllm: --------------------------------------------------------------------------------------
|
| 101 |
+
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
|
| 102 |
+
I rkllm: --------------------------------------------------------------------------------------
|
| 103 |
+
I rkllm: Prefill 2201.45 321 6.86 145.81
|
| 104 |
+
I rkllm: Generate 12419.47 102 121.76 8.21
|
| 105 |
+
I rkllm: --------------------------------------------------------------------------------------
|
| 106 |
+
I rkllm: Peak Memory Usage (GB)
|
| 107 |
+
I rkllm: 6.19
|
| 108 |
+
I rkllm: --------------------------------------------------------------------------------------
|
| 109 |
+
|
| 110 |
+
user: 1
|
| 111 |
+
Picture 1: <image> 这张图片中有什么?
|
| 112 |
+
robot: n_image_tokens: 289
|
| 113 |
+
这张照片展示了一个卧室的内部。房间有一扇大窗户,可以看到外面的绿色植物。房间里有各种物品:一个蓝色的大床单覆盖在一张床上;一盏��放在梳妆台上;一面镜子挂在墙上;书架上摆满了书籍和一些装饰品;还有一些篮子、花盆和其他小物件散落在周围。
|
| 114 |
+
|
| 115 |
+
I rkllm: --------------------------------------------------------------------------------------
|
| 116 |
+
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
|
| 117 |
+
I rkllm: --------------------------------------------------------------------------------------
|
| 118 |
+
I rkllm: Prefill 184.35 13 14.18 70.52
|
| 119 |
+
I rkllm: Generate 8711.49 72 120.99 8.26
|
| 120 |
+
I rkllm: --------------------------------------------------------------------------------------
|
| 121 |
+
I rkllm: Peak Memory Usage (GB)
|
| 122 |
+
I rkllm: 6.19
|
| 123 |
+
I rkllm: --------------------------------------------------------------------------------------
|
| 124 |
+
```
|
| 125 |
+
|
| 126 |
+
## 模型转换
|
| 127 |
+
|
| 128 |
+
#### 准备工作
|
| 129 |
+
|
| 130 |
+
1. 安装rknn-toolkit2以及rkllm-toolkit:
|
| 131 |
+
```bash
|
| 132 |
+
pip install -U rknn-toolkit2
|
| 133 |
+
```
|
| 134 |
+
rkllm-toolkit需要在这里手动下载: https://github.com/airockchip/rknn-llm/tree/main/rkllm-toolkit
|
| 135 |
+
|
| 136 |
+
2. 下载此仓库到本地, 但不需要下载`.rkllm`和`.rknn`结尾的模型文件.
|
| 137 |
+
3. 下载Qwen2.5-VL-3B-Instruct的huggingface模型仓库到本地. ( https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct )
|
| 138 |
+
|
| 139 |
+
#### 转换LLM
|
| 140 |
+
|
| 141 |
+
将`rkllm-convert.py`拷贝到Qwen2.5-VL-3B-Instruct的模型文件夹中,执行:
|
| 142 |
+
```bash
|
| 143 |
+
python rkllm-convert.py
|
| 144 |
+
```
|
| 145 |
+
默认是w8a8量化的,你可以自行打开脚本修改量化方式等。
|
| 146 |
+
|
| 147 |
+
#### 转换视觉编码器
|
| 148 |
+
|
| 149 |
+
1. 导出ONNX
|
| 150 |
+
|
| 151 |
+
将`export_vision_onnx.py`拷贝到Qwen2.5-VL-3B-Instruct的模型文件夹根目录中,然后**在该根目录**下执行:
|
| 152 |
+
```bash
|
| 153 |
+
mkdir vision
|
| 154 |
+
python ./export_vision_onnx.py . --savepath ./vision/vision_encoder.onnx
|
| 155 |
+
```
|
| 156 |
+
视觉编码器会导出到`vision/vision_encoder.onnx`. 默认宽高为476,你可以自行通过`--height`和`--width`参数修改。
|
| 157 |
+
|
| 158 |
+
2. 模型优化 (可选)
|
| 159 |
+
|
| 160 |
+
从 https://github.com/happyme531/rknn-toolkit2-utils 下载`split_matmul_onnx_profile.py`, 之后运行:
|
| 161 |
+
```bash
|
| 162 |
+
python ./split_matmul_onnx_profile.py --input vision/vision_encoder.onnx --output vision_encoder_opt.onnx --pattern "/visual/blocks\..*?/mlp/down_proj.*" --factor 5
|
| 163 |
+
```
|
| 164 |
+
优化后的模型会输出到`vision_encoder_opt.onnx`
|
| 165 |
+
|
| 166 |
+
3. 转换rknn
|
| 167 |
+
|
| 168 |
+
```bash
|
| 169 |
+
python ./convert_vision_encoder.py ./vision_encoder_opt.onnx
|
| 170 |
+
```
|
| 171 |
+
(这一步可能需要20分钟以上)
|
| 172 |
+
转换后模型会输出到`vision_encoder_opt.rknn`
|
| 173 |
+
|
| 174 |
+
为了与"使用方法"中的命令保持一致, 你可以将其重命名:
|
| 175 |
+
```bash
|
| 176 |
+
mv vision_encoder_opt.rknn vision_encoder.rknn
|
| 177 |
+
```
|
| 178 |
+
|
| 179 |
+
## 已知问题
|
| 180 |
+
|
| 181 |
+
- 由于RKLLM的多模态输入的限制, 在整个对话中只能加载一张图片.
|
| 182 |
+
- 没有实现多轮对话.
|
| 183 |
+
- RKLLM的w8a8量化貌似存在不小的精度损失.
|
| 184 |
+
- 可能由于RKNPU2的访存模式问题,输入尺寸边长不为64的整数倍时模型运行速度会有奇怪的明显提升。
|
| 185 |
+
|
| 186 |
+
## 参考
|
| 187 |
+
|
| 188 |
+
- [Qwen/Qwen2.5-VL-3B-Instruct-RKLLM](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct-RKLLM)
|
| 189 |
+
|
| 190 |
+
---
|
| 191 |
+
|
| 192 |
+
# English README
|
| 193 |
+
|
| 194 |
+
Run the powerful Qwen2.5-VL-3B-Instruct-RKLLM vision large model on RK3588!
|
| 195 |
+
|
| 196 |
+
- **Inference Speed (RK3588)**: Vision Encoder 3.4s (3-core parallel) + LLM Prefill 2.3s (320 tokens / 138 tps) + Decode 8.2 tps
|
| 197 |
+
- **Memory Usage (RK3588, context length 1024)**: 6.1GB
|
| 198 |
+
|
| 199 |
+
## How to Use
|
| 200 |
+
|
| 201 |
+
1. Clone or download this repository locally. The model is large, so ensure you have enough disk space.
|
| 202 |
+
|
| 203 |
+
2. The RKNPU2 kernel driver version on your board must be `>=0.9.6` to run such a large model. Run the following command with root privileges to check the driver version:
|
| 204 |
+
```bash
|
| 205 |
+
> cat /sys/kernel/debug/rknpu/version
|
| 206 |
+
RKNPU driver: v0.9.8
|
| 207 |
+
```
|
| 208 |
+
If the version is too old, please update the driver. You may need to update your kernel or consult the official documentation for help.
|
| 209 |
+
|
| 210 |
+
3. Install dependencies:
|
| 211 |
+
```bash
|
| 212 |
+
pip install "numpy<2" opencv-python rknn-toolkit-lite2
|
| 213 |
+
```
|
| 214 |
+
|
| 215 |
+
4. Run the model:
|
| 216 |
+
```bash
|
| 217 |
+
python ./run_rkllm.py ./test.jpg ./vision_encoder.rknn ./language_model_w8a8.rkllm 512 1024 3
|
| 218 |
+
```
|
| 219 |
+
**Parameter Descriptions:**
|
| 220 |
+
- `512`: `max_new_tokens`, the maximum number of tokens to generate.
|
| 221 |
+
- `1024`: `max_context_len`, the maximum context length.
|
| 222 |
+
- `3`: `npu_core_num`, the number of NPU cores to use.
|
| 223 |
+
|
| 224 |
+
If the performance is not ideal, you can adjust the CPU scheduler to keep the CPU running at its highest frequency and bind the inference program to the big cores (`taskset -c 4-7 python ...`).
|
| 225 |
+
|
| 226 |
+
The example output is shown in the Chinese section above.
|
| 227 |
+
|
| 228 |
+
## Model Conversion
|
| 229 |
+
|
| 230 |
+
#### Prerequisites
|
| 231 |
+
|
| 232 |
+
1. Install rknn-toolkit2 and rkllm-toolkit:
|
| 233 |
+
```bash
|
| 234 |
+
pip install -U rknn-toolkit2
|
| 235 |
+
```
|
| 236 |
+
rkllm-toolkit needs to be downloaded manually from here: https://github.com/airockchip/rknn-llm/tree/main/rkllm-toolkit
|
| 237 |
+
|
| 238 |
+
2. Download this repository locally, but you don't need the model files ending with `.rkllm` and `.rknn`.
|
| 239 |
+
3. Download the Qwen2.5-VL-3B-Instruct huggingface model repository locally from: https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct
|
| 240 |
+
|
| 241 |
+
#### Convert LLM
|
| 242 |
+
|
| 243 |
+
Copy `rkllm-convert.py` into the Qwen2.5-VL-3B-Instruct model folder and execute:
|
| 244 |
+
```bash
|
| 245 |
+
python rkllm-convert.py
|
| 246 |
+
```
|
| 247 |
+
It uses w8a8 quantization by default. You can open the script to modify the quantization method and other settings.
|
| 248 |
+
|
| 249 |
+
#### Convert Vision Encoder
|
| 250 |
+
|
| 251 |
+
1. **Export ONNX**
|
| 252 |
+
|
| 253 |
+
Copy `export_vision_onnx.py` to the root directory of the Qwen2.5-VL-3B-Instruct model folder, then execute the following **in the root directory**:
|
| 254 |
+
```bash
|
| 255 |
+
mkdir vision
|
| 256 |
+
python ./export_vision_onnx.py . --savepath ./vision/vision_encoder.onnx
|
| 257 |
+
```
|
| 258 |
+
The vision encoder will be exported to `vision/vision_encoder.onnx`. The default height and width are 476, which you can modify using the `--height` and `--width` parameters.
|
| 259 |
+
|
| 260 |
+
2. **Model Optimization (Optional)**
|
| 261 |
+
|
| 262 |
+
Download `split_matmul_onnx_profile.py` from https://github.com/happyme531/rknn-toolkit2-utils, then run:
|
| 263 |
+
```bash
|
| 264 |
+
python ./split_matmul_onnx_profile.py --input vision/vision_encoder.onnx --output vision_encoder_opt.onnx --pattern "/visual/blocks\..*?/mlp/down_proj.*" --factor 5
|
| 265 |
+
```
|
| 266 |
+
The optimized model will be saved as `vision_encoder_opt.onnx`.
|
| 267 |
+
|
| 268 |
+
3. **Convert to RKNN**
|
| 269 |
+
|
| 270 |
+
```bash
|
| 271 |
+
python ./convert_vision_encoder.py ./vision_encoder_opt.onnx
|
| 272 |
+
```
|
| 273 |
+
(This step may take over 20 minutes)
|
| 274 |
+
|
| 275 |
+
The converted model will be saved as `vision_encoder_opt.rknn`. To match the command in the "How to Use" section, you can rename it:
|
| 276 |
+
```bash
|
| 277 |
+
mv vision_encoder_opt.rknn vision_encoder.rknn
|
| 278 |
+
```
|
| 279 |
+
|
| 280 |
+
## Known Issues
|
| 281 |
+
|
| 282 |
+
- Due to limitations in RKLLM's multimodal input, only one image can be loaded per conversation.
|
| 283 |
+
- Multi-turn conversation is not implemented.
|
| 284 |
+
- The w8a8 quantization in RKLLM seems to cause a non-trivial loss of precision.
|
| 285 |
+
- Possibly due to memory access patterns of the RKNPU2, weirdly the model runs faster when the input image dimensions are not multiples of 64.
|
| 286 |
+
|
| 287 |
+
## References
|
| 288 |
+
|
| 289 |
+
- [Qwen/Qwen2.5-VL-3B-Instruct-RKLLM](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct-RKLLM)
|
convert_vision_encoder.py
ADDED
|
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python
|
| 2 |
+
# coding: utf-8
|
| 3 |
+
|
| 4 |
+
import datetime
|
| 5 |
+
import argparse
|
| 6 |
+
from rknn.api import RKNN
|
| 7 |
+
from sys import exit
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
parser = argparse.ArgumentParser(description='Convert ONNX to RKNN model.')
|
| 11 |
+
parser.add_argument('onnx_model', type=str, help='Path to the input ONNX model file.')
|
| 12 |
+
args = parser.parse_args()
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
ONNX_MODEL = args.onnx_model
|
| 16 |
+
RKNN_MODEL = ONNX_MODEL.replace(".onnx", ".rknn")
|
| 17 |
+
DATASET = "/home/zt/rk3588-nn/rknn_model_zoo/datasets/COCO/coco_subset_20.txt"
|
| 18 |
+
QUANTIZE = False
|
| 19 |
+
detailed_performance_log = True
|
| 20 |
+
|
| 21 |
+
timedate_iso = datetime.datetime.now().isoformat()
|
| 22 |
+
|
| 23 |
+
rknn = RKNN(verbose=True)
|
| 24 |
+
rknn.config(
|
| 25 |
+
# mean_values=[x * 255 for x in [0.485, 0.456, 0.406]],
|
| 26 |
+
# std_values=[x * 255 for x in [0.229, 0.224, 0.225]],
|
| 27 |
+
quantized_dtype="w8a8",
|
| 28 |
+
quantized_algorithm="normal",
|
| 29 |
+
quantized_method="channel",
|
| 30 |
+
quantized_hybrid_level=0,
|
| 31 |
+
target_platform="rk3588",
|
| 32 |
+
quant_img_RGB2BGR=False,
|
| 33 |
+
float_dtype="float16",
|
| 34 |
+
optimization_level=3,
|
| 35 |
+
custom_string=f"converted by: email: [email protected] at {timedate_iso}",
|
| 36 |
+
remove_weight=False,
|
| 37 |
+
compress_weight=False,
|
| 38 |
+
inputs_yuv_fmt=None,
|
| 39 |
+
single_core_mode=False,
|
| 40 |
+
# dynamic_input=[ #这个和下面的inputs + input_size_list二选一
|
| 41 |
+
# [
|
| 42 |
+
# [1, 3, 240, 320],
|
| 43 |
+
# # ...
|
| 44 |
+
# ],
|
| 45 |
+
# [
|
| 46 |
+
# [1, 3, 480, 640],
|
| 47 |
+
# # ...
|
| 48 |
+
# ],
|
| 49 |
+
# [
|
| 50 |
+
# [1, 3, 960, 1280],
|
| 51 |
+
# # ...
|
| 52 |
+
# ],
|
| 53 |
+
# ],
|
| 54 |
+
model_pruning=False,
|
| 55 |
+
op_target={'Gather':'cpu'},
|
| 56 |
+
quantize_weight=False,
|
| 57 |
+
remove_reshape=False,
|
| 58 |
+
sparse_infer=False,
|
| 59 |
+
enable_flash_attention=False,
|
| 60 |
+
# 隐藏的参数
|
| 61 |
+
# disable_rules=[],
|
| 62 |
+
# sram_prefer=False,
|
| 63 |
+
# nbuf_prefer=False,
|
| 64 |
+
# check_data=[],
|
| 65 |
+
)
|
| 66 |
+
|
| 67 |
+
ret = rknn.load_onnx(model=ONNX_MODEL)
|
| 68 |
+
ret = rknn.build(do_quantization=QUANTIZE, dataset=DATASET, rknn_batch_size=None)
|
| 69 |
+
ret = rknn.export_rknn(RKNN_MODEL)
|
| 70 |
+
|
| 71 |
+
# ret = rknn.init_runtime(target='rk3588',core_mask=RKNN.NPU_CORE_0,perf_debug=detailed_performance_log)
|
| 72 |
+
# rknn.eval_perf()
|
| 73 |
+
# ret = rknn.accuracy_analysis(inputs=['processed_images_rknn.npy'], target='rk3588')
|
export_vision_onnx.py
ADDED
|
@@ -0,0 +1,97 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import argparse
|
| 2 |
+
import torch
|
| 3 |
+
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
def build_patches_and_grid(pixel_values, temporal_patch_size, patch_size, merge_size):
|
| 7 |
+
assert pixel_values.dim() == 4, "pixel_values 必须是 (N, C, H, W)"
|
| 8 |
+
N, C, H, W = pixel_values.shape
|
| 9 |
+
if H % patch_size != 0 or W % patch_size != 0:
|
| 10 |
+
raise ValueError(f"H({H}) 与 W({W}) 必须能被 patch_size({patch_size}) 整除")
|
| 11 |
+
if (H // patch_size) % merge_size != 0 or (W // patch_size) % merge_size != 0:
|
| 12 |
+
raise ValueError(
|
| 13 |
+
f"(H/patch_size, W/patch_size)=({H//patch_size},{W//patch_size}) 必须能被 merge_size({merge_size}) 整除"
|
| 14 |
+
)
|
| 15 |
+
if N == 1:
|
| 16 |
+
pixel_values = pixel_values.repeat(temporal_patch_size, 1, 1, 1)
|
| 17 |
+
elif N % temporal_patch_size != 0:
|
| 18 |
+
repeat_time = temporal_patch_size - (N % temporal_patch_size)
|
| 19 |
+
repeat_image = pixel_values[-1:, ...].repeat(repeat_time, 1, 1, 1)
|
| 20 |
+
pixel_values = torch.cat((pixel_values, repeat_image), dim=0)
|
| 21 |
+
|
| 22 |
+
grid_t = pixel_values.shape[0] // temporal_patch_size
|
| 23 |
+
grid_h = H // patch_size
|
| 24 |
+
grid_w = W // patch_size
|
| 25 |
+
|
| 26 |
+
patches = pixel_values.reshape(
|
| 27 |
+
grid_t,
|
| 28 |
+
temporal_patch_size,
|
| 29 |
+
C,
|
| 30 |
+
grid_h // merge_size,
|
| 31 |
+
merge_size,
|
| 32 |
+
patch_size,
|
| 33 |
+
grid_w // merge_size,
|
| 34 |
+
merge_size,
|
| 35 |
+
patch_size,
|
| 36 |
+
)
|
| 37 |
+
patches = patches.permute(0, 3, 6, 4, 7, 2, 1, 5, 8)
|
| 38 |
+
flatten_patches = patches.reshape(
|
| 39 |
+
grid_t * grid_h * grid_w, C * temporal_patch_size * patch_size * patch_size
|
| 40 |
+
)
|
| 41 |
+
grid_thw = torch.tensor([[grid_t, grid_h, grid_w]], dtype=torch.int32, device=flatten_patches.device)
|
| 42 |
+
return flatten_patches, grid_thw
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
def main():
|
| 46 |
+
parser = argparse.ArgumentParser()
|
| 47 |
+
parser.add_argument('path', type=str, help='模型路径')
|
| 48 |
+
parser.add_argument('--batch', type=int, default=1, required=False, help='batch size')
|
| 49 |
+
parser.add_argument('--height', type=int, default=476, required=False, help='图像高度')
|
| 50 |
+
parser.add_argument('--width', type=int, default=476, required=False, help='图像宽度')
|
| 51 |
+
parser.add_argument('--savepath', type=str, default='vision_encoder.onnx', required=False, help='保存路径')
|
| 52 |
+
args = parser.parse_args()
|
| 53 |
+
|
| 54 |
+
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
|
| 55 |
+
args.path,
|
| 56 |
+
torch_dtype=torch.float32,
|
| 57 |
+
low_cpu_mem_usage=True,
|
| 58 |
+
trust_remote_code=True,
|
| 59 |
+
attn_implementation="eager",
|
| 60 |
+
).eval()
|
| 61 |
+
_ = AutoTokenizer.from_pretrained(args.path, trust_remote_code=True, use_fast=False)
|
| 62 |
+
|
| 63 |
+
vcfg = model.visual.config
|
| 64 |
+
merge_size = int(vcfg.spatial_merge_size)
|
| 65 |
+
patch_size = int(vcfg.patch_size)
|
| 66 |
+
temporal_patch_size = int(vcfg.temporal_patch_size)
|
| 67 |
+
|
| 68 |
+
# 构造输入
|
| 69 |
+
N, C, H, W = int(args.batch), 3, int(args.height), int(args.width)
|
| 70 |
+
pixel_values = torch.randn(N, C, H, W, dtype=torch.float32)
|
| 71 |
+
|
| 72 |
+
with torch.no_grad():
|
| 73 |
+
fp, gthw = build_patches_and_grid(pixel_values, temporal_patch_size, patch_size, merge_size)
|
| 74 |
+
vision_features = model.visual(fp, gthw)
|
| 75 |
+
print(f"视觉特征维度: {vision_features.shape}")
|
| 76 |
+
print(f"视觉token数量: {vision_features.shape[0]}")
|
| 77 |
+
|
| 78 |
+
def top_forward(pixel_values_in):
|
| 79 |
+
fp, gthw = build_patches_and_grid(pixel_values_in, temporal_patch_size, patch_size, merge_size)
|
| 80 |
+
return model.visual(fp, gthw)
|
| 81 |
+
|
| 82 |
+
model.forward = top_forward
|
| 83 |
+
|
| 84 |
+
torch.onnx.export(
|
| 85 |
+
model,
|
| 86 |
+
(pixel_values,),
|
| 87 |
+
args.savepath,
|
| 88 |
+
opset_version=17,
|
| 89 |
+
input_names=["pixel_values"],
|
| 90 |
+
output_names=["vision_features"],
|
| 91 |
+
)
|
| 92 |
+
|
| 93 |
+
|
| 94 |
+
if __name__ == '__main__':
|
| 95 |
+
main()
|
| 96 |
+
|
| 97 |
+
|
rkllm-convert.py
CHANGED
|
@@ -17,7 +17,7 @@ if ret != 0:
|
|
| 17 |
exit(ret)
|
| 18 |
|
| 19 |
# Export rkllm model
|
| 20 |
-
ret = llm.export_rkllm("./
|
| 21 |
if ret != 0:
|
| 22 |
print('Export model failed!')
|
| 23 |
exit(ret)
|
|
|
|
| 17 |
exit(ret)
|
| 18 |
|
| 19 |
# Export rkllm model
|
| 20 |
+
ret = llm.export_rkllm("./language_model_w8a8.rkllm")
|
| 21 |
if ret != 0:
|
| 22 |
print('Export model failed!')
|
| 23 |
exit(ret)
|
run_rkllm.py
CHANGED
|
@@ -20,8 +20,8 @@ from rkllm_binding import (
|
|
| 20 |
)
|
| 21 |
|
| 22 |
# Constants
|
| 23 |
-
IMAGE_HEIGHT =
|
| 24 |
-
IMAGE_WIDTH =
|
| 25 |
|
| 26 |
def expand2square(img, background_color):
|
| 27 |
"""
|
|
@@ -69,14 +69,16 @@ def main():
|
|
| 69 |
# The rknn_core_num is not directly used by onnxruntime in the same way,
|
| 70 |
# but we keep it for API consistency with the C++ example.
|
| 71 |
# ONNX Runtime will manage its own threading and execution providers.
|
| 72 |
-
parser.add_argument("rknn_core_num", type=int, help="
|
| 73 |
|
| 74 |
args = parser.parse_args()
|
| 75 |
|
| 76 |
# --- 1. Initialize Image Encoder (ONNX Runtime) ---
|
| 77 |
print("Initializing ONNX Runtime for vision encoder...")
|
| 78 |
try:
|
| 79 |
-
|
|
|
|
|
|
|
| 80 |
except Exception as e:
|
| 81 |
print(f"Failed to load ONNX model: {e}")
|
| 82 |
sys.exit(1)
|
|
@@ -131,8 +133,12 @@ def main():
|
|
| 131 |
|
| 132 |
# --- 4. Run Image Encoder ---
|
| 133 |
print("Running vision encoder...")
|
|
|
|
|
|
|
| 134 |
try:
|
| 135 |
img_vec_output = ort_session.run([output_name], {input_name: input_tensor.astype(np.float32)})[0]
|
|
|
|
|
|
|
| 136 |
# The output from C++ is a flat float array. Let's flatten the ONNX output.
|
| 137 |
img_vec = img_vec_output.flatten().astype(np.float32)
|
| 138 |
|
|
|
|
| 20 |
)
|
| 21 |
|
| 22 |
# Constants
|
| 23 |
+
IMAGE_HEIGHT = 476
|
| 24 |
+
IMAGE_WIDTH = 476
|
| 25 |
|
| 26 |
def expand2square(img, background_color):
|
| 27 |
"""
|
|
|
|
| 69 |
# The rknn_core_num is not directly used by onnxruntime in the same way,
|
| 70 |
# but we keep it for API consistency with the C++ example.
|
| 71 |
# ONNX Runtime will manage its own threading and execution providers.
|
| 72 |
+
parser.add_argument("rknn_core_num", type=int, help="Sets the number of npu cores used in vision encoder.")
|
| 73 |
|
| 74 |
args = parser.parse_args()
|
| 75 |
|
| 76 |
# --- 1. Initialize Image Encoder (ONNX Runtime) ---
|
| 77 |
print("Initializing ONNX Runtime for vision encoder...")
|
| 78 |
try:
|
| 79 |
+
sess_options = ort.SessionOptions()
|
| 80 |
+
sess_options.intra_op_num_threads = args.rknn_core_num
|
| 81 |
+
ort_session = ort.InferenceSession(args.encoder_model_path, sess_options=sess_options)
|
| 82 |
except Exception as e:
|
| 83 |
print(f"Failed to load ONNX model: {e}")
|
| 84 |
sys.exit(1)
|
|
|
|
| 133 |
|
| 134 |
# --- 4. Run Image Encoder ---
|
| 135 |
print("Running vision encoder...")
|
| 136 |
+
import time
|
| 137 |
+
start_time = time.time()
|
| 138 |
try:
|
| 139 |
img_vec_output = ort_session.run([output_name], {input_name: input_tensor.astype(np.float32)})[0]
|
| 140 |
+
elapsed_time = time.time() - start_time
|
| 141 |
+
print(f"视觉编码器推理耗时: {elapsed_time:.4f} 秒")
|
| 142 |
# The output from C++ is a flat float array. Let's flatten the ONNX output.
|
| 143 |
img_vec = img_vec_output.flatten().astype(np.float32)
|
| 144 |
|
vision_encoder.rknn
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:401402b3cfa6ab292bb7ae51c208f51a14c36cf1a534ab5392b24efc315fb60f
|
| 3 |
+
size 1557737667
|