jingyiZ00
/

R1-VL-7B

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

R1-VL-7B

R1-VL-7B is a reasoning model trained with step-wise group relative policy optimization (StepGRPO).

Paper: https://arxiv.org/pdf/2503.12937

Github: https://github.com/jingyi0000/R1-VL

Base model: https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct

Downloads last month: 33

Safetensors

Model size

8.29B params

Tensor type

BF16

·

Model tree for jingyiZ00/R1-VL-7B

Base model

Qwen/Qwen2-VL-7B

Finetuned

Qwen/Qwen2-VL-7B-Instruct

Finetuned

(420)

this model

Dataset used to train jingyiZ00/R1-VL-7B

Collection including jingyiZ00/R1-VL-7B

R1-VL

4 items • Updated Apr 16