InternVL3_5-2B-RKLLM

(English README see below)

在RK3588上运行强大的InternVL3.5-2B视觉大模型!

  • 推理速度(RK3588): 视觉编码器 2.1s(三核并行) + LLM 填充 1s (265 tokens / 261 tps) + 解码 12.1 tps
  • 内存占用(RK3588, 上下文长度1024): 3.9GB

使用方法

  1. 克隆或者下载此仓库到本地. 模型较大, 请确保有足够的磁盘空间.

  2. 开发板的RKNPU2内核驱动版本必须>=0.9.6才能运行这么大的模型. 使用root权限运行以下命令检查驱动版本:

    > cat /sys/kernel/debug/rknpu/version 
    RKNPU driver: v0.9.8
    

    如果版本过低, 请更新驱动. 你可能需要更新内核, 或查找官方文档以获取帮助.

  3. 安装依赖

pip install "numpy<2" opencv-python rknn-toolkit-lite2
  1. 运行
python ./run_rkllm.py ./test.jpg ./vision_encoder.rknn ./language_model_w8a8.rkllm 512 1024 3

参数说明:

  • 512: max_new_tokens, 最大生成token数.
  • 1024: max_context_len, 最大上下文长度.
  • 3: npu_core_num, 使用的NPU核心数.

如果实测性能不理想, 可以调整CPU调度器让CPU始终运行在最高频率, 并把推理程序绑定到大核(taskset -c 4-7 python ...)

test.jpg: test.jpg

Initializing ONNX Runtime for vision encoder...
I rknn-toolkit2 version: 2.3.2
I target set by user is: rk3588
Vision encoder loaded successfully.
ONNX Input: pixel_values, ONNX Output: projected_features
Initializing RKLLM Runtime...
I rkllm: rkllm-runtime version: 1.2.2, rknpu driver version: 0.9.8, platform: RK3588
I rkllm: loading rkllm model from ./language_model_w8a8.rkllm
I rkllm: rkllm-toolkit version: 1.2.2, max_context_limit: 4096, npu_core_num: 3, target_platform: RK3588, model_dtype: W8A8
I rkllm: Enabled cpus: [4, 5, 6, 7]
I rkllm: Enabled cpus num: 4
RKLLM initialized successfully.
Preprocessing image...
Running vision encoder...
视觉编码器推理耗时: 2.0876 秒
Image encoded successfully.

**********************可输入以下问题对应序号获取回答/或自定义输入********************

[0] <image>What is in the image?
[1] <image>这张图片中有什么?

*************************************************************************


user: 0
<image>What is in the image?
robot: n_image_tokens:  256


This image depicts a cozy bedroom with a large window, several pieces of furniture, and various decorative items. The room has a vintage feel due to the wallpaper pattern and the wooden furniture.

The bed occupies the left side of the image, covered with a blue comforter or quilt. Next to the bed is a dresser with a round mirror above it. On top of the dresser are several small objects, including what appears to be a water bottle and some decorative items like plants.

In front of the window on the right side of the image, there is a chair with a checkered cushion. Behind this chair, there is a bookshelf filled with books and various other items, such as baskets and possibly some knick-knacks. The bookshelf has multiple levels, each holding an assortment of books and decorative objects.

The window allows natural light to enter the room, illuminating the space and highlighting the greenery outside. There are also potted plants placed around the room, adding a touch of nature and freshness to the interior decor.

Overall, this bedroom exudes a sense of comfort and personal style, with elements that suggest it is used regularly by someone who values both aesthetics and functionality in their living space.


I rkllm: --------------------------------------------------------------------------------------
I rkllm:  Model init time (ms)  4314.30                                                                    
I rkllm: --------------------------------------------------------------------------------------
I rkllm:  Stage         Total Time (ms)  Tokens    Time per Token (ms)      Tokens per Second      
I rkllm: --------------------------------------------------------------------------------------
I rkllm:  Prefill       1013.32          265       3.82                     261.52                 
I rkllm:  Generate      20155.65         244       82.61                    12.11                  
I rkllm: --------------------------------------------------------------------------------------
I rkllm:  Peak Memory Usage (GB)
I rkllm:  3.45        
I rkllm: --------------------------------------------------------------------------------------

user: 1
<image>这张图片中有什么?
robot: n_image_tokens:  256


这是一间温馨的卧室,房间内有一扇大窗户、几件家具和各种装饰物品。房间因壁纸图案和木质家具而显得复古。

床位于图像左侧,覆盖着蓝色被套或毯子。床旁边是一个带有圆形镜子的抽屉柜。在抽屉柜上摆放着一些小物件,包括水瓶和一些装饰品,如植物。

窗户右侧前方有一把带格子坐垫的椅子。椅子后面是一排书架,上面摆满了书籍和其他物品,如篮子和可能的一些小饰品。书架有多层,每层都放着各种书籍和装饰物。

窗外可以看到绿树,自然光透过窗户照进房间,照亮了空间,并突出了外面的绿色植物。房间里还摆放了一些盆栽植物,为室内增添了自然的气息和清新感。

总体而言,这间卧室给人一种舒适和个性的感觉,表明它经常被居住者使用,居住者重视生活空间中的美学和功能性。

I rkllm: --------------------------------------------------------------------------------------
I rkllm:  Stage         Total Time (ms)  Tokens    Time per Token (ms)      Tokens per Second      
I rkllm: --------------------------------------------------------------------------------------
I rkllm:  Prefill       1287.65          264       4.88                     205.03                 
I rkllm:  Generate      19852.10         204       97.31                    10.28                  
I rkllm: --------------------------------------------------------------------------------------
I rkllm:  Peak Memory Usage (GB)
I rkllm:  3.45        
I rkllm: --------------------------------------------------------------------------------------

user: ^C
Exiting...
Releasing resources...
RKLLM instance destroyed.

模型转换

准备工作

  1. 安装rknn-toolkit2以及rkllm-toolkit:
pip install -U rknn-toolkit2 

rkllm-toolkit需要在这里手动下载: https://github.com/airockchip/rknn-llm/tree/main/rkllm-toolkit

  1. 下载此仓库到本地, 但不需要下载.rkllm.rknn结尾的模型文件.
  2. 下载InternVL3.5-2B的huggingface模型仓库到本地. ( https://huggingface.co/OpenGVLab/InternVL3_5-2B-HF )

转换LLM

rkllm-convert.py拷贝到InternVL3_5-2B-HF的模型文件夹中,执行:

python rkllm-convert.py

默认是w8a8量化的,你可以自行打开脚本修改量化方式等。

转换视觉编码器

  1. 导出ONNX

export_vision_onnx.py拷贝到InternVL3_5-2B-HF的模型文件夹根目录中,然后在该根目录下执行:

python ./export_vision_onnx.py 

视觉编码器会导出到vision_encoder.onnx.

  1. 转换rknn
python ./convert_vision_encoder.py

已知问题

  • 由于RKLLM的多模态输入的限制, 在整个对话中只能加载一张图片.
  • 没有实现多轮对话.
  • RKLLM的w8a8量化貌似存在不小的精度损失.
  • 没有实现原模型中的高清图像分块输入与视频输入功能. 原因是我懒得做了,以后可以考虑加上.

参考


English README

Run the powerful InternVL3.5-2B large vision model on RK3588!

  • Inference Speed (RK3588): Vision Encoder 2.1s (3-core parallel) + LLM Prefill 1s (265 tokens / 261 tps) + Decode 12.1 tps
  • Memory Usage (RK3588, context length 1024): 3.9GB

How to Use

  1. Clone or download this repository locally. The model is large, so ensure you have enough disk space.

  2. The RKNPU2 kernel driver version on your development board must be >=0.9.6 to run this model. Run the following command with root privileges to check the driver version:

    > cat /sys/kernel/debug/rknpu/version 
    RKNPU driver: v0.9.8
    

    If the version is too low, please update the driver. You may need to update the kernel or refer to the official documentation for help.

  3. Install dependencies:

pip install "numpy<2" opencv-python rknn-toolkit-lite2
  1. Run:
python ./run_rkllm.py ./test.jpg ./vision_encoder.rknn ./language_model_w8a8.rkllm 512 1024 3

Parameter description:

  • 512: max_new_tokens, the maximum number of tokens to generate.
  • 1024: max_context_len, the maximum context length.
  • 3: npu_core_num, the number of NPU cores to use.

If the performance is not ideal, you can adjust the CPU scheduler to keep the CPU at its highest frequency and bind the inference program to the big cores (taskset -c 4-7 python ...).

Example with test.jpg: test.jpg

Initializing ONNX Runtime for vision encoder...
I rknn-toolkit2 version: 2.3.2
I target set by user is: rk3588
Vision encoder loaded successfully.
ONNX Input: pixel_values, ONNX Output: projected_features
Initializing RKLLM Runtime...
I rkllm: rkllm-runtime version: 1.2.2, rknpu driver version: 0.9.8, platform: RK3588
I rkllm: loading rkllm model from ./language_model_w8a8.rkllm
I rkllm: rkllm-toolkit version: 1.2.2, max_context_limit: 4096, npu_core_num: 3, target_platform: RK3588, model_dtype: W8A8
I rkllm: Enabled cpus: [4, 5, 6, 7]
I rkllm: Enabled cpus num: 4
RKLLM initialized successfully.
Preprocessing image...
Running vision encoder...
视觉编码器推理耗时: 2.0876 秒
Image encoded successfully.

**********************可输入以下问题对应序号获取回答/或自定义输入********************

[0] <image>What is in the image?
[1] <image>这张图片中有什么?

*************************************************************************


user: 0
<image>What is in the image?
robot: n_image_tokens:  256


This image depicts a cozy bedroom with a large window, several pieces of furniture, and various decorative items. The room has a vintage feel due to the wallpaper pattern and the wooden furniture.

The bed occupies the left side of the image, covered with a blue comforter or quilt. Next to the bed is a dresser with a round mirror above it. On top of the dresser are several small objects, including what appears to be a water bottle and some decorative items like plants.

In front of the window on the right side of the image, there is a chair with a checkered cushion. Behind this chair, there is a bookshelf filled with books and various other items, such as baskets and possibly some knick-knacks. The bookshelf has multiple levels, each holding an assortment of books and decorative objects.

The window allows natural light to enter the room, illuminating the space and highlighting the greenery outside. There are also potted plants placed around the room, adding a touch of nature and freshness to the interior decor.

Overall, this bedroom exudes a sense of comfort and personal style, with elements that suggest it is used regularly by someone who values both aesthetics and functionality in their living space.


I rkllm: --------------------------------------------------------------------------------------
I rkllm:  Model init time (ms)  4314.30                                                                    
I rkllm: --------------------------------------------------------------------------------------
I rkllm:  Stage         Total Time (ms)  Tokens    Time per Token (ms)      Tokens per Second      
I rkllm: --------------------------------------------------------------------------------------
I rkllm:  Prefill       1013.32          265       3.82                     261.52                 
I rkllm:  Generate      20155.65         244       82.61                    12.11                  
I rkllm: --------------------------------------------------------------------------------------
I rkllm:  Peak Memory Usage (GB)
I rkllm:  3.45        
I rkllm: --------------------------------------------------------------------------------------

user: ^C
Exiting...
Releasing resources...
RKLLM instance destroyed.

Model Conversion

Prerequisites

  1. Install rknn-toolkit2 and rkllm-toolkit:
pip install -U rknn-toolkit2 

rkllm-toolkit needs to be downloaded manually from here: https://github.com/airockchip/rknn-llm/tree/main/rkllm-toolkit

  1. Download this repository locally, but you don't need the .rkllm and .rknn model files.
  2. Download the InternVL3.5-2B huggingface model repository locally. ( https://huggingface.co/OpenGVLab/InternVL3_5-2B-HF )

Convert LLM

Copy rkllm-convert.py to the InternVL3_5-2B-HF model folder and run:

python rkllm-convert.py

The default quantization is w8a8. You can modify the script to change quantization methods.

Convert Vision Encoder

  1. Export ONNX

Copy export_vision_onnx.py to the root directory of the InternVL3_5-2B-HF model folder, and then execute it in that root directory:

python ./export_vision_onnx.py 

The vision encoder will be exported to vision_encoder.onnx.

  1. Convert to RKNN
python ./convert_vision_encoder.py

Known Issues

  • Due to limitations in RKLLM's multimodal input, only one image can be loaded throughout the conversation.
  • Multi-turn conversation is not implemented.
  • RKLLM's w8a8 quantization appears to have significant precision loss.
  • The high-resolution image tiling and video input features from the original model are not implemented. The reason is that I'm too lazy to do it, and it can be considered adding it later.

References

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for happyme531/InternVL3_5-2B-RKLLM