File size: 495 Bytes
4613050
 
 
 
 
 
 
 
 
 
 
 
 
 
3f150d4
4613050
3f150d4
4613050
3f150d4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
---
license: apache-2.0
datasets:
- HuanjinYao/Mulberry-SFT
base_model:
- Qwen/Qwen2-VL-7B-Instruct
pipeline_tag: image-text-to-text
library_name: transformers
---
# R1-VL-7B

<!-- Provide a quick summary of what the model is/does. -->
R1-VL-7B is a reasoning model trained with step-wise group relative policy optimization (StepGRPO).

### Paper: https://arxiv.org/pdf/2503.12937

### Github: https://github.com/jingyi0000/R1-VL

### Base model: https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct