zoeyuchao nielsr HF Staff commited on
Commit
fd79a44
·
verified ·
1 Parent(s): cececd3

Improve model card: Update pipeline tag, add library name, and link paper (#1)

Browse files

- Improve model card: Update pipeline tag, add library name, and link paper (92f921c845822e134ed5a982517ca2124ae3607c)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +30 -22
README.md CHANGED
@@ -1,47 +1,50 @@
1
  ---
2
- license: mit
3
- tags:
4
- - RLinf
5
  language:
6
  - en
 
7
  metrics:
8
  - accuracy
9
- base_model:
10
- - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
11
- pipeline_tag: reinforcement-learning
 
 
12
  model-index:
13
  - name: RLinf-math-7B
14
  results:
15
  - task:
16
- type: math # Required. Example: automatic-speech-recognition
17
  dataset:
18
- type: aime_2024 # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
19
- name: AIME24 # Required. A pretty name for the dataset. Example: Common Voice (French)
20
  metrics:
21
- - type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
22
- value: 68.328125 # Required. Example: 20.90
23
  - task:
24
- type: math # Required. Example: automatic-speech-recognition
25
  dataset:
26
- type: aime_2025 # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
27
- name: AIME25 # Required. A pretty name for the dataset. Example: Common Voice (French)
28
  metrics:
29
- - type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
30
- value: 52.19375 # Required. Example: 20.90
31
  - task:
32
- type: stem # Required. Example: automatic-speech-recognition
33
  dataset:
34
- type: gpqa_diamond # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
35
- name: GPQA-diamond # Required. A pretty name for the dataset. Example: Common Voice (French)
36
  metrics:
37
- - type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
38
- value: 48.178124999999994 # Required. Example: 20.90
39
  ---
40
 
41
  <div align="center">
42
  <img src="logo.svg" alt="RLinf-logo" width="500"/>
43
  </div>
44
 
 
45
 
46
  <div align="center">
47
  <!-- <a href="TODO"><img src="https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv"></a> -->
@@ -96,10 +99,15 @@ We trained and evaluated two models using RLinf:
96
  | Model | AIME 24 | AIME 25 | GPQA-diamond | Average |
97
  | ---------------------------------------- | --------- | --------- | ------------ | --------- |
98
  | [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) | 54.90 | 40.20 | 45.48 | 46.86 |
 
99
  | [AReaL-boba-RL-7B](https://huggingface.co/inclusionAI/AReaL-boba-RL-7B) | 61.66 | 49.38 | 46.93 | 52.66 |
 
100
  | [Skywork-OR1-7B](https://huggingface.co/Skywork/Skywork-OR1-7B) | 66.87 | 52.49 | 44.43 | 54.60 |
 
101
  | [Polaris-7B-Preview](https://huggingface.co/POLARIS-Project/Polaris-7B-Preview) | **68.55** | 51.24 | 43.88 | 54.56 |
 
102
  | [AceMath-RL-Nemotron-7B](https://huggingface.co/nvidia/AceMath-RL-Nemotron-7B) | 67.30 | **55.00** | 45.57 | 55.96 |
 
103
  | [RLinf-math-7B](https://huggingface.co/RLinf/RLinf-math-7B) | 68.33 | 52.19 | **48.18** | **56.23** |
104
 
105
 
@@ -128,4 +136,4 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
128
  ```
129
 
130
  ## License
131
- This code repository and the model weights are licensed under the MIT License.
 
1
  ---
2
+ base_model:
3
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
 
4
  language:
5
  - en
6
+ license: mit
7
  metrics:
8
  - accuracy
9
+ pipeline_tag: text-generation
10
+ library_name: transformers
11
+ tags:
12
+ - RLinf
13
+ - reinforcement-learning
14
  model-index:
15
  - name: RLinf-math-7B
16
  results:
17
  - task:
18
+ type: math
19
  dataset:
20
+ name: AIME24
21
+ type: aime_2024
22
  metrics:
23
+ - type: accuracy
24
+ value: 68.328125
25
  - task:
26
+ type: math
27
  dataset:
28
+ name: AIME25
29
+ type: aime_2025
30
  metrics:
31
+ - type: accuracy
32
+ value: 52.19375
33
  - task:
34
+ type: stem
35
  dataset:
36
+ name: GPQA-diamond
37
+ type: gpqa_diamond
38
  metrics:
39
+ - type: accuracy
40
+ value: 48.178124999999994
41
  ---
42
 
43
  <div align="center">
44
  <img src="logo.svg" alt="RLinf-logo" width="500"/>
45
  </div>
46
 
47
+ The model was presented in the paper [RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training](https://huggingface.co/papers/2510.06710).
48
 
49
  <div align="center">
50
  <!-- <a href="TODO"><img src="https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv"></a> -->
 
99
  | Model | AIME 24 | AIME 25 | GPQA-diamond | Average |
100
  | ---------------------------------------- | --------- | --------- | ------------ | --------- |
101
  | [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) | 54.90 | 40.20 | 45.48 | 46.86 |
102
+
103
  | [AReaL-boba-RL-7B](https://huggingface.co/inclusionAI/AReaL-boba-RL-7B) | 61.66 | 49.38 | 46.93 | 52.66 |
104
+
105
  | [Skywork-OR1-7B](https://huggingface.co/Skywork/Skywork-OR1-7B) | 66.87 | 52.49 | 44.43 | 54.60 |
106
+
107
  | [Polaris-7B-Preview](https://huggingface.co/POLARIS-Project/Polaris-7B-Preview) | **68.55** | 51.24 | 43.88 | 54.56 |
108
+
109
  | [AceMath-RL-Nemotron-7B](https://huggingface.co/nvidia/AceMath-RL-Nemotron-7B) | 67.30 | **55.00** | 45.57 | 55.96 |
110
+
111
  | [RLinf-math-7B](https://huggingface.co/RLinf/RLinf-math-7B) | 68.33 | 52.19 | **48.18** | **56.23** |
112
 
113
 
 
136
  ```
137
 
138
  ## License
139
+ This code repository and the model weights are licensed under the MIT License.