Update README.md
Browse files
README.md
CHANGED
@@ -12,8 +12,8 @@ library_name: transformers
|
|
12 |
<!-- Provide a quick summary of what the model is/does. -->
|
13 |
R1-VL-7B is a reasoning model trained with step-wise group relative policy optimization (StepGRPO).
|
14 |
|
15 |
-
|
16 |
|
17 |
-
|
18 |
|
19 |
-
|
|
|
12 |
<!-- Provide a quick summary of what the model is/does. -->
|
13 |
R1-VL-7B is a reasoning model trained with step-wise group relative policy optimization (StepGRPO).
|
14 |
|
15 |
+
### Paper: https://arxiv.org/pdf/2503.12937
|
16 |
|
17 |
+
### Github: https://github.com/jingyi0000/R1-VL
|
18 |
|
19 |
+
### Base model: https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct
|