jingyiZ00
/

R1-VL-2B

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

R1-VL-2B

R1-VL-2B is a reasoning model trained with step-wise group relative policy optimization (StepGRPO).

Paper: https://arxiv.org/pdf/2503.12937

Github: https://github.com/jingyi0000/R1-VL

Base model: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct

Downloads last month: 21

Safetensors

Model size

2.21B params

Tensor type

F32

·

Model tree for jingyiZ00/R1-VL-2B

Base model

Qwen/Qwen2-VL-2B

Finetuned

Qwen/Qwen2-VL-2B-Instruct

Finetuned

(265)

this model

Dataset used to train jingyiZ00/R1-VL-2B

Collection including jingyiZ00/R1-VL-2B

R1-VL

4 items • Updated Apr 16