FastVLM-0.5B?
I tried using your rkllm-convert.py
to convert the 0.5B.
I get a bunch of unsupported layer messages like the following.
ERROR: Found unsupported layer: model.vision_tower.vision_tower.model.network.10.1.convffn.conv.bn.running_var!
ERROR: Found unsupported layer: model.vision_tower.vision_tower.model.network.10.1.convffn.conv.bn.num_batches_tracked!
ERROR: Found unsupported layer: model.vision_tower.vision_tower.model.network.10.1.convffn.fc1.weight!
ERROR: Found unsupported layer: model.vision_tower.vision_tower.model.network.10.1.convffn.fc1.bias!
ERROR: Found unsupported layer: model.vision_tower.vision_tower.model.network.10.1.convffn.fc2.weight!
ERROR: Found unsupported layer: model.vision_tower.vision_tower.model.network.10.1.convffn.fc2.bias!
ERROR: Found unsupported layer: model.vision_tower.vision_tower.model.conv_exp.se.reduce.weight!
ERROR: Found unsupported layer: model.vision_tower.vision_tower.model.conv_exp.se.reduce.bias!
ERROR: Found unsupported layer: model.vision_tower.vision_tower.model.conv_exp.se.expand.weight!
ERROR: Found unsupported layer: model.vision_tower.vision_tower.model.conv_exp.se.expand.bias!
ERROR: Found unsupported layer: model.vision_tower.vision_tower.model.conv_exp.reparam_conv.weight!
ERROR: Found unsupported layer: model.vision_tower.vision_tower.model.conv_exp.reparam_conv.bias!
ERROR: Found unsupported layer: model.vision_tower.vision_tower.model.head.proj!
ERROR: Found unsupported layer: model.mm_projector.0.weight!
ERROR: Found unsupported layer: model.mm_projector.0.bias!
ERROR: Found unsupported layer: model.mm_projector.2.weight!
ERROR: Found unsupported layer: model.mm_projector.2.bias!
Trying to run it with your test then gives this output
Loading ONNX vision encoder model...
I RKNN: [15:13:02.143] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [15:13:02.143] RKNN Driver Information, version: 0.9.7
I RKNN: [15:13:02.147] RKNN Model Information, version: 6, toolkit version: 2.3.2(compiler version: 2.3.2 (e045de294f@2025-04-07T19:48:25)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [15:13:02.623] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
ONNX vision encoder loaded. Input: 'pixel_values', Output: 'last_hidden_state'
Loading ONNX mm_projector model...
I RKNN: [15:13:02.689] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [15:13:02.689] RKNN Driver Information, version: 0.9.7
I RKNN: [15:13:02.689] RKNN Model Information, version: 6, toolkit version: 2.3.0(compiler version: 2.3.0 (@2024-11-07T08:11:34)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [15:13:02.705] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
ONNX mm_projector loaded. Input: 'last_hidden_state', Output: 'projected_image_features'
Initializing RKLLM...
Start loading language model (size: 1225.72 MB)
I rkllm: rkllm-runtime version: 1.2.1b1, rknpu driver version: 0.9.7, platform: RK3588
I rkllm: loading rkllm model from ./qwen_f16.rkllm
I rkllm: rkllm-toolkit version: 1.2.1, max_context_limit: 4096, npu_core_num: 3, target_platform: RK3588
I RKNN: [15:13:03.191] RKNN LLM Runtime Information, rknn llm lib version: 2.3.3b0 (c54da5763@2025-04-18T15:37:10)
I RKNN: [15:13:03.191] RKNN Driver Information, version: 0.9.7
I rkllm: Enabled cpus: [0, 1, 2, 3, 4, 5, 6, 7]
I rkllm: Enabled cpus num: 8
I rkllm: system_prompt: <|im_start|>system\nYou are a helpful assistant.<|im_end|>\n
I rkllm: prompt_prefix: <|im_start|>user\n
I rkllm: prompt_postfix: <|im_end|>\n<|im_start|>assistant\n
Language model loaded in 1.87 seconds
Loading and preprocessing image: test.jpg
Target image size from config: 1024x1024
Using image_mean: [0.0, 0.0, 0.0], image_std: [1.0, 1.0, 1.0]
Input image shape for ONNX vision model: (1, 3, 1024, 1024)
W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.
ONNX Vision encoder inference time: 3.50 seconds
Vision encoder output shape: (1, 256, 3072)
ONNX MM projector inference time: 0.03 seconds
Projected image embeddings shape: (1, 256, 1536)
Using prompt:
Describe this image in detail.
Starting RKLLM inference...
Time to first token: 0.75 seconds
ate?Exception ignored on calling ctypes callback function: <function result_callback at 0xffff86b8bc40>
Traceback (most recent call last):
File "/home/lb/projects/FastVLM-1.5B-RKLLM/run_rknn.py", line 71, in result_callback
result = result_ptr.contents # Dereference the pointer
^^^^^^^^^^^^^^^^^^^
ValueError: NULL pointer access
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 1375.29 / / /
I rkllm: Prefill 734.17 286 2.57 389.55
I rkllm: Generate 116.11 2 58.05 17.23
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 2.14
I rkllm: --------------------------------------------------------------------------------------
RKLLM instance destroyed at script end.
Script finished.
This is my first time trying to convert to rkllm so I'm sure I'm missing something.
The unsupported layer
message is expected since they are all vision part.
You will probably need to export and convert the vision encoder and multimodal projector models from the 0.5B version as well for the model to work correctly.