File size: 495 Bytes
4613050 3f150d4 4613050 3f150d4 4613050 3f150d4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
---
license: apache-2.0
datasets:
- HuanjinYao/Mulberry-SFT
base_model:
- Qwen/Qwen2-VL-7B-Instruct
pipeline_tag: image-text-to-text
library_name: transformers
---
# R1-VL-7B
<!-- Provide a quick summary of what the model is/does. -->
R1-VL-7B is a reasoning model trained with step-wise group relative policy optimization (StepGRPO).
### Paper: https://arxiv.org/pdf/2503.12937
### Github: https://github.com/jingyi0000/R1-VL
### Base model: https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct |