Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,220 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- aa
|
5 |
+
- af
|
6 |
+
- ar
|
7 |
+
- as
|
8 |
+
- az
|
9 |
+
- be
|
10 |
+
- bg
|
11 |
+
- bn
|
12 |
+
- bs
|
13 |
+
- ca
|
14 |
+
- cs
|
15 |
+
- da
|
16 |
+
- de
|
17 |
+
- el
|
18 |
+
- en
|
19 |
+
- es
|
20 |
+
- et
|
21 |
+
- eu
|
22 |
+
- fa
|
23 |
+
- fi
|
24 |
+
- fr
|
25 |
+
- ha
|
26 |
+
- he
|
27 |
+
- hi
|
28 |
+
- hr
|
29 |
+
- hu
|
30 |
+
- hy
|
31 |
+
- id
|
32 |
+
- ie
|
33 |
+
- it
|
34 |
+
- iw
|
35 |
+
- ja
|
36 |
+
- ka
|
37 |
+
- kk
|
38 |
+
- ko
|
39 |
+
- ku
|
40 |
+
- la
|
41 |
+
- lt
|
42 |
+
- lv
|
43 |
+
- mk
|
44 |
+
- ms
|
45 |
+
- my
|
46 |
+
- nl
|
47 |
+
- nn
|
48 |
+
- no
|
49 |
+
- oc
|
50 |
+
- pl
|
51 |
+
- pt
|
52 |
+
- ro
|
53 |
+
- ru
|
54 |
+
- rw
|
55 |
+
- sa
|
56 |
+
- sco
|
57 |
+
- si
|
58 |
+
- sk
|
59 |
+
- sl
|
60 |
+
- sr
|
61 |
+
- sv
|
62 |
+
- sw
|
63 |
+
- ta
|
64 |
+
- th
|
65 |
+
- tl
|
66 |
+
- tlh
|
67 |
+
- tr
|
68 |
+
- tt
|
69 |
+
- uk
|
70 |
+
- vi
|
71 |
+
- vo
|
72 |
+
- war
|
73 |
+
- xh
|
74 |
+
- zh
|
75 |
+
datasets:
|
76 |
+
- rubricreward/mR3-Dataset-100K-EasyToHard
|
77 |
+
base_model:
|
78 |
+
- Qwen/Qwen3-14B
|
79 |
+
pipeline_tag: text-generation
|
80 |
+
library_name: transformers
|
81 |
+
---
|
82 |
+
<img alt="mR3 Logo" src="https://cdn-avatars.huggingface.co/v1/production/uploads/651803f834c26962535eb022/hj3UEN9_9wlkmvMfUY1OL.png" width="150px">
|
83 |
+
|
84 |
+
# mR3-Qwen3-14B-tgt-prompt-en-thinking
|
85 |
+
|
86 |
+
mR3-Qwen3-14B-tgt-prompt-en-thinking is part of the mR3 family, a series of Multilingual Rubric-Agnostic Reward Reasoning Models.
|
87 |
+
We perform SFT on the Qwen3 model family on the 4B, 8B, and 14B scales.
|
88 |
+
Check out [our paper](https://arxiv.org/abs/2510.01146) for more information!
|
89 |
+
|
90 |
+
|
91 |
+
## Model description
|
92 |
+
|
93 |
+
- **Model type:** A reward model trained on a curated mR3 dataset collected from 72 languages that covers
|
94 |
+
tasks such as classification, preference optimization, and question answering. Each example in the dataset contains an instruction and task description, input, response(s),
|
95 |
+
evaluation rubrics, and a score along with the corresponding reasoning in both English and non-English.
|
96 |
+
- **Number of Language(s) (NLP):** 72 languages
|
97 |
+
- **License:** Apache 2.0
|
98 |
+
- **Finetuned from model:** Qwen/Qwen3-14B
|
99 |
+
|
100 |
+
### Model Sources
|
101 |
+
|
102 |
+
- **Project Page:** https://rubricreward.github.io
|
103 |
+
- **Repository:** https://github.com/rubricreward/mr3
|
104 |
+
- **Paper:** https://arxiv.org/abs/2510.01146
|
105 |
+
|
106 |
+
## Using the Model
|
107 |
+
|
108 |
+
For the following examples, we will use `messages` as our pairwise task example.
|
109 |
+
|
110 |
+
<details>
|
111 |
+
|
112 |
+
<summary>Click to reveal the example prompt</summary>
|
113 |
+
|
114 |
+
```python
|
115 |
+
system_prompt = """# μ§μ
|
116 |
+
|
117 |
+
κ·νλ 곡μ ν μ¬μ¬μμμΌλ‘μ, μ£Όμ΄μ§ μ¬μ©μ ν둬ννΈμ λν΄ λ κ°μ 보쑰 λ΅λ³ μ€ μ΄λ κ²μ΄ λ λμμ§ νκ°ν΄μΌ ν©λλ€. λ€μ κΈ°μ€μ μμμ λ°λΌ 체κ³μ μΌλ‘ λ΅λ³μ λΉκ΅νμΈμ
|
118 |
+
|
119 |
+
# νκ°κΈ°μ€
|
120 |
+
Assistant B: Assistant Bλ μ λ°μ μΌλ‘ λ λμ μλ΅μ μ 곡ν©λλ€. νλ¨ κΈ°μ€μ λ€μ μμλλ‘ μ
λλ€ β μμ μ± λ° μ μ μ±, μ μ©μ±, κ΄λ ¨μ±, κ°κ²°μ±, μ μ€ν¨, κ·Έλ¦¬κ³ ν¬κ΄μ±μ
λλ€.
|
121 |
+
Assistant A: Assistant Aλ μ λ°μ μΌλ‘ λ λμ μλ΅μ μ 곡ν©λλ€. νλ¨ κΈ°μ€μ λ€μ μμλλ‘ μ
λλ€ β μμ μ± λ° μ μ μ±, μ μ©μ±, κ΄λ ¨μ±, κ°κ²°μ±, μ μ€ν¨, κ·Έλ¦¬κ³ ν¬κ΄μ±μ
λλ€.
|
122 |
+
|
123 |
+
# μλ΅ νμ
|
124 |
+
|
125 |
+
{'type': 'object', 'properties': {'explanation': {'type': 'string', 'description': 'νκ° μμλ₯Ό κΈ°λ°μΌλ‘ Assistant Aμ Assistant Bλ₯Ό λΉκ΅ν κ°λ¨ν μ΄μ μ€λͺ
μ
λλ€.'}, 'score': {'type': 'string', 'description': "νμ κ²°κ³Ό: 'Assistant A' λλ 'Assistant B' μ€ νλμ
λλ€.", 'enum': ['Assistant A', 'Assistant B']}}, 'required': ['explanation', 'score']}
|
126 |
+
"""
|
127 |
+
|
128 |
+
user_prompt = """# μ
λ ₯(μ¬μ©μ ν둬ννΈ)
|
129 |
+
μ²νμ΄λ 무μμΈκ°?
|
130 |
+
# Assistant A
|
131 |
+
μ²ν"μ λν μ μλ₯Ό μ 곡ν μ μμ΅λλ€. λ§₯λ½μ λ°λΌ μλ―Έκ° λ€λ₯Ό μ μκΈ° λλ¬Έμ
λλ€. κ·Έλ¬λ λ²μ μ²λ²μ΄λ νλ ₯ νμμ΄λ μ΄λ€ ννμ μ²νμ΄λ κ°λ±μ ν΄κ²°νκ±°λ μ¬ν λ¬Έμ λ₯Ό ν΄κ²°νλ λ° μ ν©ν ν΄κ²°μ±
μ΄ μλλΌλ μ μ κ°μ‘°νκ³ μΆμ΅λλ€. λͺ¨λ μ¬λμ μλͺ
, μ‘΄μμ±, μ‘΄μ€μ λ°μ κΆλ¦¬κ° μμΌλ©°, μ°¨μ΄μ κ°λ±μ ν΄κ²°νλ ννλ‘κ³ κ±΄μ€μ μΈ λ°©λ²μ μ°Ύλ κ²μ΄ μ€μν©λλ€. λ€λ₯Έ μ§λ¬Έμ΄λ μ°λ € μ¬νμ΄ μμΌλ©΄ μΈμ λ μ§ λ¬ΈμνμΈμ. λμμ΄ λκ³ μμ ν λ΅λ³μ μ 곡νκΈ° μν΄ μ΅μ μ λ€νκ² μ΅λλ€.
|
132 |
+
# Assistant B
|
133 |
+
μ¬ν μ§νμ λ²μμ΄ νλ, μΌλ°μ μΌλ‘ νμ¬ μ μ£ νκ²°μ λν μ§μνμ μ§ννλλ‘ λͺ
λ Ήνλ λ²μ μ μ°¨μ
λλ€. μ¬κΈ°μλ κ΄ν κΆμ λ²λ₯ μ λ°λΌ μΉλͺ
μ μ£Όμ¬, κ°μ μ¬, κ΅μν λλ λ€λ₯Έ μ¬ν λ°©λ²μ΄ ν¬ν¨λ μ μμ΅λλ€. μ¬νμ΄ μλ κ²½μ°, μ¬ν μ§νμ μ§μνμ μ§ννλ κ²μ μλ―Ένλ©°, μ¬κΈ°μλ ꡬκΈ, 보νΈκ΄μ°° λλ κΈ°ν μ²λ²μ΄ ν¬ν¨λ μ μμ΅λλ€.
|
134 |
+
# λΉμ μ μλ΅
|
135 |
+
"""
|
136 |
+
# prepare the model input
|
137 |
+
messages = [
|
138 |
+
{'role': 'system', 'content': system_prompt},
|
139 |
+
{'role': 'user', 'content': user_prompt}
|
140 |
+
]
|
141 |
+
```
|
142 |
+
</details>
|
143 |
+
|
144 |
+
### π§ Using `transformers`
|
145 |
+
|
146 |
+
Below is an example of using our `mR3-Qwen3-14B-tgt-prompt-en-thinking` model by using an non-English prompt and an English reasoning using π€ `transformers`:
|
147 |
+
|
148 |
+
```python
|
149 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
150 |
+
model_name = "rubricreward/mR3-Qwen3-14B-tgt-prompt-en-thinking"
|
151 |
+
# Load the tokenizer and the model
|
152 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
153 |
+
model = AutoModelForCausalLM.from_pretrained(
|
154 |
+
model_name,
|
155 |
+
torch_dtype="auto",
|
156 |
+
device_map="auto"
|
157 |
+
)
|
158 |
+
text = tokenizer.apply_chat_template(
|
159 |
+
messages,
|
160 |
+
tokenize=False,
|
161 |
+
add_generation_prompt=True,
|
162 |
+
enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
|
163 |
+
)
|
164 |
+
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
165 |
+
# Conduct text completion
|
166 |
+
generated_ids = model.generate(
|
167 |
+
**model_inputs,
|
168 |
+
max_new_tokens=16384,
|
169 |
+
temperature=0.6, top_p=0.95, min_p=0, top_k=20
|
170 |
+
)
|
171 |
+
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
|
172 |
+
# Parsing thinking content
|
173 |
+
try:
|
174 |
+
# rindex finding 151668 (</think>)
|
175 |
+
index = len(output_ids) - output_ids[::-1].index(151668)
|
176 |
+
except ValueError:
|
177 |
+
index = 0
|
178 |
+
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
|
179 |
+
print(content)
|
180 |
+
```
|
181 |
+
|
182 |
+
### β‘ Using `vLLM`
|
183 |
+
|
184 |
+
Alternatively, you may also use `vLLM` for faster inference:
|
185 |
+
|
186 |
+
```python
|
187 |
+
from transformers import AutoTokenizer
|
188 |
+
from vllm import LLM, SamplingParams
|
189 |
+
model_path = "rubricreward/mR3-Qwen3-14B-tgt-prompt-en-thinking"
|
190 |
+
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
191 |
+
sampling_params = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=16384, min_p=0, top_k=20)
|
192 |
+
llm = LLM(
|
193 |
+
model=model_path,
|
194 |
+
dtype="bfloat16",
|
195 |
+
max_model_len=32768,
|
196 |
+
)
|
197 |
+
list_text = tokenizer.apply_chat_template(
|
198 |
+
messages,
|
199 |
+
tokenize=False,
|
200 |
+
add_generation_prompt=True,
|
201 |
+
enable_thinking=True # Switch between thinking and non-thinking modes.
|
202 |
+
)
|
203 |
+
outputs = llm.generate(list_text, sampling_params)
|
204 |
+
print(outputs[0].output.text)
|
205 |
+
```
|
206 |
+
|
207 |
+
## License and use
|
208 |
+
|
209 |
+
mR3 is licensed under the Apache 2.0 license.
|
210 |
+
|
211 |
+
## Citation
|
212 |
+
|
213 |
+
```bibtex
|
214 |
+
@article{anugraha2025mr3,
|
215 |
+
title={mR3: Multilingual Rubric-Agnostic Reward Reasoning Models},
|
216 |
+
author={Anugraha, David and Hung, Shou-Yi and Tang, Zilu and Lee, Annie En-Shiun and Wijaya, Derry and Winata, Genta Indra},
|
217 |
+
journal={arXiv preprint arXiv:2510.01146},
|
218 |
+
year={2025}
|
219 |
+
}
|
220 |
+
```
|