happyme531 commited on
Commit
4019d6d
·
verified ·
1 Parent(s): aa043cf

Upload 6 files

Browse files
README.md CHANGED
@@ -1,7 +1,289 @@
1
  ---
2
- license: agpl-3.0
 
 
 
 
3
  ---
 
4
 
5
- (Placeholder for a document)
6
 
7
- NOTE: The vision encoder is currently broken for this model, you can try on it but expect degraded results!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-VL-3B-Instruct
4
+ tags:
5
+ - rknn
6
+ - rkllm
7
  ---
8
+ # Qwen2.5-VL-3B-Instruct-RKLLM
9
 
10
+ ## (English README see below)
11
 
12
+ 在RK3588上运行强大的Qwen2.5-VL-3B-Instruct-RKLLM视觉大模型!
13
+
14
+ - 推理速度(RK3588): 视觉编码器 3.4s(三核并行) + LLM 填充 2.3s (320 tokens / 138 tps) + 解码 8.2 tps
15
+ - 内存占用(RK3588, 上下文长度1024): 6.1GB
16
+
17
+ ## 使用方法
18
+
19
+ 1. 克隆或者下载此仓库到本地. 模型较大, 请确保有足够的磁盘空间.
20
+
21
+ 2. 开发板的RKNPU2内核驱动版本必须>=0.9.6才能运行这么大的模型.
22
+ 使用root权限运行以下命令检查驱动版本:
23
+ ```bash
24
+ > cat /sys/kernel/debug/rknpu/version
25
+ RKNPU driver: v0.9.8
26
+ ```
27
+ 如果版本过低, 请更新驱动. 你可能需要更新内核, 或查找官方文档以获取帮助.
28
+
29
+ 3. 安装依赖
30
+
31
+ ```bash
32
+ pip install "numpy<2" opencv-python rknn-toolkit-lite2
33
+ ```
34
+
35
+ 4. 运行
36
+
37
+ ```bash
38
+ python ./run_rkllm.py ./test.jpg ./vision_encoder.rknn ./language_model_w8a8.rkllm 512 1024 3
39
+ ```
40
+
41
+ 参数说明:
42
+ - `512`: max_new_tokens, 最大生成token数.
43
+ - `1024`: max_context_len, 最大上下文长度.
44
+ - `3`: npu_core_num, 使用的NPU核心数.
45
+
46
+ 如果实测性能不理想, 可以调整CPU调度器让CPU始终运行在最高频率, 并把推理程序绑定到大核(`taskset -c 4-7 python ...`)
47
+
48
+ test.jpg:
49
+ ![test.jpg](./test.jpg)
50
+
51
+ ```
52
+ Initializing ONNX Runtime for vision encoder...
53
+ W rknn-toolkit-lite2 version: 2.3.2
54
+ W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
55
+ Vision encoder loaded successfully.
56
+ ONNX Input: pixel_values, ONNX Output: vision_features
57
+ Initializing RKLLM Runtime...
58
+ I rkllm: rkllm-runtime version: 1.2.1, rknpu driver version: 0.9.8, platform: RK3588
59
+ I rkllm: loading rkllm model from ./language_model_w8a8.rkllm
60
+ I rkllm: rkllm-toolkit version: 1.2.1, max_context_limit: 4096, npu_core_num: 3, target_platform: RK3588, model_dtype: W8A8
61
+ I rkllm: Enabled cpus: [4, 5, 6, 7]
62
+ I rkllm: Enabled cpus num: 4
63
+ I rkllm: Using mrope
64
+ RKLLM initialized successfully.
65
+ Preprocessing image...
66
+ Running vision encoder...
67
+ W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.
68
+ 视觉编码器推理耗时: 3.5427 秒
69
+ Image encoded successfully.
70
+ I rkllm: reset chat template:
71
+ I rkllm: system_prompt: <|im_start|>system\nYou are a helpful assistant.<|im_end|>\n
72
+ I rkllm: prompt_prefix: <|im_start|>user\n
73
+ I rkllm: prompt_postfix: <|im_end|>\n<|im_start|>assistant\n
74
+ W rkllm: Calling rkllm_set_chat_template will disable the internal automatic chat template parsing, including enable_thinking. Make sure your custom prompt is complete and valid.
75
+
76
+ **********************可输入以下问题对应序号获取回答/或自定义输入********************
77
+
78
+ [0] Picture 1: <image> What is in the image?
79
+ [1] Picture 1: <image> 这张图片中有什么?
80
+
81
+ *************************************************************************
82
+
83
+
84
+ user: 0
85
+ Picture 1: <image> What is in the image?
86
+ robot: n_image_tokens: 289
87
+ The image shows a cozy bedroom with several notable features:
88
+
89
+ - A large bed covered with a blue comforter.
90
+ - A wooden dresser next to the bed, topped with various items including a mirror and some decorative objects.
91
+ - A window allowing natural light into the room, offering a view of greenery outside.
92
+ - A bookshelf filled with numerous books on shelves.
93
+ - A basket placed near the foot of the bed.
94
+ - A lamp on a side table beside the bed.
95
+
96
+ The overall ambiance is warm and inviting.
97
+
98
+ I rkllm: --------------------------------------------------------------------------------------
99
+ I rkllm: Model init time (ms) 3361.48
100
+ I rkllm: --------------------------------------------------------------------------------------
101
+ I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
102
+ I rkllm: --------------------------------------------------------------------------------------
103
+ I rkllm: Prefill 2201.45 321 6.86 145.81
104
+ I rkllm: Generate 12419.47 102 121.76 8.21
105
+ I rkllm: --------------------------------------------------------------------------------------
106
+ I rkllm: Peak Memory Usage (GB)
107
+ I rkllm: 6.19
108
+ I rkllm: --------------------------------------------------------------------------------------
109
+
110
+ user: 1
111
+ Picture 1: <image> 这张图片中有什么?
112
+ robot: n_image_tokens: 289
113
+ 这张照片展示了一个卧室的内部。房间有一扇大窗户,可以看到外面的绿色植物。房间里有各种物品:一个蓝色的大床单覆盖在一张床上;一盏��放在梳妆台上;一面镜子挂在墙上;书架上摆满了书籍和一些装饰品;还有一些篮子、花盆和其他小物件散落在周围。
114
+
115
+ I rkllm: --------------------------------------------------------------------------------------
116
+ I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
117
+ I rkllm: --------------------------------------------------------------------------------------
118
+ I rkllm: Prefill 184.35 13 14.18 70.52
119
+ I rkllm: Generate 8711.49 72 120.99 8.26
120
+ I rkllm: --------------------------------------------------------------------------------------
121
+ I rkllm: Peak Memory Usage (GB)
122
+ I rkllm: 6.19
123
+ I rkllm: --------------------------------------------------------------------------------------
124
+ ```
125
+
126
+ ## 模型转换
127
+
128
+ #### 准备工作
129
+
130
+ 1. 安装rknn-toolkit2以及rkllm-toolkit:
131
+ ```bash
132
+ pip install -U rknn-toolkit2
133
+ ```
134
+ rkllm-toolkit需要在这里手动下载: https://github.com/airockchip/rknn-llm/tree/main/rkllm-toolkit
135
+
136
+ 2. 下载此仓库到本地, 但不需要下载`.rkllm`和`.rknn`结尾的模型文件.
137
+ 3. 下载Qwen2.5-VL-3B-Instruct的huggingface模型仓库到本地. ( https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct )
138
+
139
+ #### 转换LLM
140
+
141
+ 将`rkllm-convert.py`拷贝到Qwen2.5-VL-3B-Instruct的模型文件夹中,执行:
142
+ ```bash
143
+ python rkllm-convert.py
144
+ ```
145
+ 默认是w8a8量化的,你可以自行打开脚本修改量化方式等。
146
+
147
+ #### 转换视觉编码器
148
+
149
+ 1. 导出ONNX
150
+
151
+ 将`export_vision_onnx.py`拷贝到Qwen2.5-VL-3B-Instruct的模型文件夹根目录中,然后**在该根目录**下执行:
152
+ ```bash
153
+ mkdir vision
154
+ python ./export_vision_onnx.py . --savepath ./vision/vision_encoder.onnx
155
+ ```
156
+ 视觉编码器会导出到`vision/vision_encoder.onnx`. 默认宽高为476,你可以自行通过`--height`和`--width`参数修改。
157
+
158
+ 2. 模型优化 (可选)
159
+
160
+ 从 https://github.com/happyme531/rknn-toolkit2-utils 下载`split_matmul_onnx_profile.py`, 之后运行:
161
+ ```bash
162
+ python ./split_matmul_onnx_profile.py --input vision/vision_encoder.onnx --output vision_encoder_opt.onnx --pattern "/visual/blocks\..*?/mlp/down_proj.*" --factor 5
163
+ ```
164
+ 优化后的模型会输出到`vision_encoder_opt.onnx`
165
+
166
+ 3. 转换rknn
167
+
168
+ ```bash
169
+ python ./convert_vision_encoder.py ./vision_encoder_opt.onnx
170
+ ```
171
+ (这一步可能需要20分钟以上)
172
+ 转换后模型会输出到`vision_encoder_opt.rknn`
173
+
174
+ 为了与"使用方法"中的命令保持一致, 你可以将其重命名:
175
+ ```bash
176
+ mv vision_encoder_opt.rknn vision_encoder.rknn
177
+ ```
178
+
179
+ ## 已知问题
180
+
181
+ - 由于RKLLM的多模态输入的限制, 在整个对话中只能加载一张图片.
182
+ - 没有实现多轮对话.
183
+ - RKLLM的w8a8量化貌似存在不小的精度损失.
184
+ - 可能由于RKNPU2的访存模式问题,输入尺寸边长不为64的整数倍时模型运行速度会有奇怪的明显提升。
185
+
186
+ ## 参考
187
+
188
+ - [Qwen/Qwen2.5-VL-3B-Instruct-RKLLM](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct-RKLLM)
189
+
190
+ ---
191
+
192
+ # English README
193
+
194
+ Run the powerful Qwen2.5-VL-3B-Instruct-RKLLM vision large model on RK3588!
195
+
196
+ - **Inference Speed (RK3588)**: Vision Encoder 3.4s (3-core parallel) + LLM Prefill 2.3s (320 tokens / 138 tps) + Decode 8.2 tps
197
+ - **Memory Usage (RK3588, context length 1024)**: 6.1GB
198
+
199
+ ## How to Use
200
+
201
+ 1. Clone or download this repository locally. The model is large, so ensure you have enough disk space.
202
+
203
+ 2. The RKNPU2 kernel driver version on your board must be `>=0.9.6` to run such a large model. Run the following command with root privileges to check the driver version:
204
+ ```bash
205
+ > cat /sys/kernel/debug/rknpu/version
206
+ RKNPU driver: v0.9.8
207
+ ```
208
+ If the version is too old, please update the driver. You may need to update your kernel or consult the official documentation for help.
209
+
210
+ 3. Install dependencies:
211
+ ```bash
212
+ pip install "numpy<2" opencv-python rknn-toolkit-lite2
213
+ ```
214
+
215
+ 4. Run the model:
216
+ ```bash
217
+ python ./run_rkllm.py ./test.jpg ./vision_encoder.rknn ./language_model_w8a8.rkllm 512 1024 3
218
+ ```
219
+ **Parameter Descriptions:**
220
+ - `512`: `max_new_tokens`, the maximum number of tokens to generate.
221
+ - `1024`: `max_context_len`, the maximum context length.
222
+ - `3`: `npu_core_num`, the number of NPU cores to use.
223
+
224
+ If the performance is not ideal, you can adjust the CPU scheduler to keep the CPU running at its highest frequency and bind the inference program to the big cores (`taskset -c 4-7 python ...`).
225
+
226
+ The example output is shown in the Chinese section above.
227
+
228
+ ## Model Conversion
229
+
230
+ #### Prerequisites
231
+
232
+ 1. Install rknn-toolkit2 and rkllm-toolkit:
233
+ ```bash
234
+ pip install -U rknn-toolkit2
235
+ ```
236
+ rkllm-toolkit needs to be downloaded manually from here: https://github.com/airockchip/rknn-llm/tree/main/rkllm-toolkit
237
+
238
+ 2. Download this repository locally, but you don't need the model files ending with `.rkllm` and `.rknn`.
239
+ 3. Download the Qwen2.5-VL-3B-Instruct huggingface model repository locally from: https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct
240
+
241
+ #### Convert LLM
242
+
243
+ Copy `rkllm-convert.py` into the Qwen2.5-VL-3B-Instruct model folder and execute:
244
+ ```bash
245
+ python rkllm-convert.py
246
+ ```
247
+ It uses w8a8 quantization by default. You can open the script to modify the quantization method and other settings.
248
+
249
+ #### Convert Vision Encoder
250
+
251
+ 1. **Export ONNX**
252
+
253
+ Copy `export_vision_onnx.py` to the root directory of the Qwen2.5-VL-3B-Instruct model folder, then execute the following **in the root directory**:
254
+ ```bash
255
+ mkdir vision
256
+ python ./export_vision_onnx.py . --savepath ./vision/vision_encoder.onnx
257
+ ```
258
+ The vision encoder will be exported to `vision/vision_encoder.onnx`. The default height and width are 476, which you can modify using the `--height` and `--width` parameters.
259
+
260
+ 2. **Model Optimization (Optional)**
261
+
262
+ Download `split_matmul_onnx_profile.py` from https://github.com/happyme531/rknn-toolkit2-utils, then run:
263
+ ```bash
264
+ python ./split_matmul_onnx_profile.py --input vision/vision_encoder.onnx --output vision_encoder_opt.onnx --pattern "/visual/blocks\..*?/mlp/down_proj.*" --factor 5
265
+ ```
266
+ The optimized model will be saved as `vision_encoder_opt.onnx`.
267
+
268
+ 3. **Convert to RKNN**
269
+
270
+ ```bash
271
+ python ./convert_vision_encoder.py ./vision_encoder_opt.onnx
272
+ ```
273
+ (This step may take over 20 minutes)
274
+
275
+ The converted model will be saved as `vision_encoder_opt.rknn`. To match the command in the "How to Use" section, you can rename it:
276
+ ```bash
277
+ mv vision_encoder_opt.rknn vision_encoder.rknn
278
+ ```
279
+
280
+ ## Known Issues
281
+
282
+ - Due to limitations in RKLLM's multimodal input, only one image can be loaded per conversation.
283
+ - Multi-turn conversation is not implemented.
284
+ - The w8a8 quantization in RKLLM seems to cause a non-trivial loss of precision.
285
+ - Possibly due to memory access patterns of the RKNPU2, weirdly the model runs faster when the input image dimensions are not multiples of 64.
286
+
287
+ ## References
288
+
289
+ - [Qwen/Qwen2.5-VL-3B-Instruct-RKLLM](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct-RKLLM)
convert_vision_encoder.py ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+ # coding: utf-8
3
+
4
+ import datetime
5
+ import argparse
6
+ from rknn.api import RKNN
7
+ from sys import exit
8
+
9
+
10
+ parser = argparse.ArgumentParser(description='Convert ONNX to RKNN model.')
11
+ parser.add_argument('onnx_model', type=str, help='Path to the input ONNX model file.')
12
+ args = parser.parse_args()
13
+
14
+
15
+ ONNX_MODEL = args.onnx_model
16
+ RKNN_MODEL = ONNX_MODEL.replace(".onnx", ".rknn")
17
+ DATASET = "/home/zt/rk3588-nn/rknn_model_zoo/datasets/COCO/coco_subset_20.txt"
18
+ QUANTIZE = False
19
+ detailed_performance_log = True
20
+
21
+ timedate_iso = datetime.datetime.now().isoformat()
22
+
23
+ rknn = RKNN(verbose=True)
24
+ rknn.config(
25
+ # mean_values=[x * 255 for x in [0.485, 0.456, 0.406]],
26
+ # std_values=[x * 255 for x in [0.229, 0.224, 0.225]],
27
+ quantized_dtype="w8a8",
28
+ quantized_algorithm="normal",
29
+ quantized_method="channel",
30
+ quantized_hybrid_level=0,
31
+ target_platform="rk3588",
32
+ quant_img_RGB2BGR=False,
33
+ float_dtype="float16",
34
+ optimization_level=3,
35
+ custom_string=f"converted by: email: [email protected] at {timedate_iso}",
36
+ remove_weight=False,
37
+ compress_weight=False,
38
+ inputs_yuv_fmt=None,
39
+ single_core_mode=False,
40
+ # dynamic_input=[ #这个和下面的inputs + input_size_list二选一
41
+ # [
42
+ # [1, 3, 240, 320],
43
+ # # ...
44
+ # ],
45
+ # [
46
+ # [1, 3, 480, 640],
47
+ # # ...
48
+ # ],
49
+ # [
50
+ # [1, 3, 960, 1280],
51
+ # # ...
52
+ # ],
53
+ # ],
54
+ model_pruning=False,
55
+ op_target={'Gather':'cpu'},
56
+ quantize_weight=False,
57
+ remove_reshape=False,
58
+ sparse_infer=False,
59
+ enable_flash_attention=False,
60
+ # 隐藏的参数
61
+ # disable_rules=[],
62
+ # sram_prefer=False,
63
+ # nbuf_prefer=False,
64
+ # check_data=[],
65
+ )
66
+
67
+ ret = rknn.load_onnx(model=ONNX_MODEL)
68
+ ret = rknn.build(do_quantization=QUANTIZE, dataset=DATASET, rknn_batch_size=None)
69
+ ret = rknn.export_rknn(RKNN_MODEL)
70
+
71
+ # ret = rknn.init_runtime(target='rk3588',core_mask=RKNN.NPU_CORE_0,perf_debug=detailed_performance_log)
72
+ # rknn.eval_perf()
73
+ # ret = rknn.accuracy_analysis(inputs=['processed_images_rknn.npy'], target='rk3588')
export_vision_onnx.py ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import torch
3
+ from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer
4
+
5
+
6
+ def build_patches_and_grid(pixel_values, temporal_patch_size, patch_size, merge_size):
7
+ assert pixel_values.dim() == 4, "pixel_values 必须是 (N, C, H, W)"
8
+ N, C, H, W = pixel_values.shape
9
+ if H % patch_size != 0 or W % patch_size != 0:
10
+ raise ValueError(f"H({H}) 与 W({W}) 必须能被 patch_size({patch_size}) 整除")
11
+ if (H // patch_size) % merge_size != 0 or (W // patch_size) % merge_size != 0:
12
+ raise ValueError(
13
+ f"(H/patch_size, W/patch_size)=({H//patch_size},{W//patch_size}) 必须能被 merge_size({merge_size}) 整除"
14
+ )
15
+ if N == 1:
16
+ pixel_values = pixel_values.repeat(temporal_patch_size, 1, 1, 1)
17
+ elif N % temporal_patch_size != 0:
18
+ repeat_time = temporal_patch_size - (N % temporal_patch_size)
19
+ repeat_image = pixel_values[-1:, ...].repeat(repeat_time, 1, 1, 1)
20
+ pixel_values = torch.cat((pixel_values, repeat_image), dim=0)
21
+
22
+ grid_t = pixel_values.shape[0] // temporal_patch_size
23
+ grid_h = H // patch_size
24
+ grid_w = W // patch_size
25
+
26
+ patches = pixel_values.reshape(
27
+ grid_t,
28
+ temporal_patch_size,
29
+ C,
30
+ grid_h // merge_size,
31
+ merge_size,
32
+ patch_size,
33
+ grid_w // merge_size,
34
+ merge_size,
35
+ patch_size,
36
+ )
37
+ patches = patches.permute(0, 3, 6, 4, 7, 2, 1, 5, 8)
38
+ flatten_patches = patches.reshape(
39
+ grid_t * grid_h * grid_w, C * temporal_patch_size * patch_size * patch_size
40
+ )
41
+ grid_thw = torch.tensor([[grid_t, grid_h, grid_w]], dtype=torch.int32, device=flatten_patches.device)
42
+ return flatten_patches, grid_thw
43
+
44
+
45
+ def main():
46
+ parser = argparse.ArgumentParser()
47
+ parser.add_argument('path', type=str, help='模型路径')
48
+ parser.add_argument('--batch', type=int, default=1, required=False, help='batch size')
49
+ parser.add_argument('--height', type=int, default=476, required=False, help='图像高度')
50
+ parser.add_argument('--width', type=int, default=476, required=False, help='图像宽度')
51
+ parser.add_argument('--savepath', type=str, default='vision_encoder.onnx', required=False, help='保存路径')
52
+ args = parser.parse_args()
53
+
54
+ model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
55
+ args.path,
56
+ torch_dtype=torch.float32,
57
+ low_cpu_mem_usage=True,
58
+ trust_remote_code=True,
59
+ attn_implementation="eager",
60
+ ).eval()
61
+ _ = AutoTokenizer.from_pretrained(args.path, trust_remote_code=True, use_fast=False)
62
+
63
+ vcfg = model.visual.config
64
+ merge_size = int(vcfg.spatial_merge_size)
65
+ patch_size = int(vcfg.patch_size)
66
+ temporal_patch_size = int(vcfg.temporal_patch_size)
67
+
68
+ # 构造输入
69
+ N, C, H, W = int(args.batch), 3, int(args.height), int(args.width)
70
+ pixel_values = torch.randn(N, C, H, W, dtype=torch.float32)
71
+
72
+ with torch.no_grad():
73
+ fp, gthw = build_patches_and_grid(pixel_values, temporal_patch_size, patch_size, merge_size)
74
+ vision_features = model.visual(fp, gthw)
75
+ print(f"视觉特征维度: {vision_features.shape}")
76
+ print(f"视觉token数量: {vision_features.shape[0]}")
77
+
78
+ def top_forward(pixel_values_in):
79
+ fp, gthw = build_patches_and_grid(pixel_values_in, temporal_patch_size, patch_size, merge_size)
80
+ return model.visual(fp, gthw)
81
+
82
+ model.forward = top_forward
83
+
84
+ torch.onnx.export(
85
+ model,
86
+ (pixel_values,),
87
+ args.savepath,
88
+ opset_version=17,
89
+ input_names=["pixel_values"],
90
+ output_names=["vision_features"],
91
+ )
92
+
93
+
94
+ if __name__ == '__main__':
95
+ main()
96
+
97
+
rkllm-convert.py CHANGED
@@ -17,7 +17,7 @@ if ret != 0:
17
  exit(ret)
18
 
19
  # Export rkllm model
20
- ret = llm.export_rkllm("./language_model.rkllm")
21
  if ret != 0:
22
  print('Export model failed!')
23
  exit(ret)
 
17
  exit(ret)
18
 
19
  # Export rkllm model
20
+ ret = llm.export_rkllm("./language_model_w8a8.rkllm")
21
  if ret != 0:
22
  print('Export model failed!')
23
  exit(ret)
run_rkllm.py CHANGED
@@ -20,8 +20,8 @@ from rkllm_binding import (
20
  )
21
 
22
  # Constants
23
- IMAGE_HEIGHT = 448
24
- IMAGE_WIDTH = 448
25
 
26
  def expand2square(img, background_color):
27
  """
@@ -69,14 +69,16 @@ def main():
69
  # The rknn_core_num is not directly used by onnxruntime in the same way,
70
  # but we keep it for API consistency with the C++ example.
71
  # ONNX Runtime will manage its own threading and execution providers.
72
- parser.add_argument("rknn_core_num", type=int, help="Core number for RKNN (informational for this script).")
73
 
74
  args = parser.parse_args()
75
 
76
  # --- 1. Initialize Image Encoder (ONNX Runtime) ---
77
  print("Initializing ONNX Runtime for vision encoder...")
78
  try:
79
- ort_session = ort.InferenceSession(args.encoder_model_path)
 
 
80
  except Exception as e:
81
  print(f"Failed to load ONNX model: {e}")
82
  sys.exit(1)
@@ -131,8 +133,12 @@ def main():
131
 
132
  # --- 4. Run Image Encoder ---
133
  print("Running vision encoder...")
 
 
134
  try:
135
  img_vec_output = ort_session.run([output_name], {input_name: input_tensor.astype(np.float32)})[0]
 
 
136
  # The output from C++ is a flat float array. Let's flatten the ONNX output.
137
  img_vec = img_vec_output.flatten().astype(np.float32)
138
 
 
20
  )
21
 
22
  # Constants
23
+ IMAGE_HEIGHT = 476
24
+ IMAGE_WIDTH = 476
25
 
26
  def expand2square(img, background_color):
27
  """
 
69
  # The rknn_core_num is not directly used by onnxruntime in the same way,
70
  # but we keep it for API consistency with the C++ example.
71
  # ONNX Runtime will manage its own threading and execution providers.
72
+ parser.add_argument("rknn_core_num", type=int, help="Sets the number of npu cores used in vision encoder.")
73
 
74
  args = parser.parse_args()
75
 
76
  # --- 1. Initialize Image Encoder (ONNX Runtime) ---
77
  print("Initializing ONNX Runtime for vision encoder...")
78
  try:
79
+ sess_options = ort.SessionOptions()
80
+ sess_options.intra_op_num_threads = args.rknn_core_num
81
+ ort_session = ort.InferenceSession(args.encoder_model_path, sess_options=sess_options)
82
  except Exception as e:
83
  print(f"Failed to load ONNX model: {e}")
84
  sys.exit(1)
 
133
 
134
  # --- 4. Run Image Encoder ---
135
  print("Running vision encoder...")
136
+ import time
137
+ start_time = time.time()
138
  try:
139
  img_vec_output = ort_session.run([output_name], {input_name: input_tensor.astype(np.float32)})[0]
140
+ elapsed_time = time.time() - start_time
141
+ print(f"视觉编码器推理耗时: {elapsed_time:.4f} 秒")
142
  # The output from C++ is a flat float array. Let's flatten the ONNX output.
143
  img_vec = img_vec_output.flatten().astype(np.float32)
144
 
vision_encoder.rknn CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:165201488d5abaa6fd8c9d471b6d49ab18c508bf2a4f161a5e7164d18438a23c
3
- size 1424694394
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:401402b3cfa6ab292bb7ae51c208f51a14c36cf1a534ab5392b24efc315fb60f
3
+ size 1557737667