Text Generation
Transformers
Safetensors
qwen3
conversational
text-generation-inference
davidanugraha commited on
Commit
efcf99a
Β·
verified Β·
1 Parent(s): 4ddefad

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +220 -0
README.md ADDED
@@ -0,0 +1,220 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - aa
5
+ - af
6
+ - ar
7
+ - as
8
+ - az
9
+ - be
10
+ - bg
11
+ - bn
12
+ - bs
13
+ - ca
14
+ - cs
15
+ - da
16
+ - de
17
+ - el
18
+ - en
19
+ - es
20
+ - et
21
+ - eu
22
+ - fa
23
+ - fi
24
+ - fr
25
+ - ha
26
+ - he
27
+ - hi
28
+ - hr
29
+ - hu
30
+ - hy
31
+ - id
32
+ - ie
33
+ - it
34
+ - iw
35
+ - ja
36
+ - ka
37
+ - kk
38
+ - ko
39
+ - ku
40
+ - la
41
+ - lt
42
+ - lv
43
+ - mk
44
+ - ms
45
+ - my
46
+ - nl
47
+ - nn
48
+ - no
49
+ - oc
50
+ - pl
51
+ - pt
52
+ - ro
53
+ - ru
54
+ - rw
55
+ - sa
56
+ - sco
57
+ - si
58
+ - sk
59
+ - sl
60
+ - sr
61
+ - sv
62
+ - sw
63
+ - ta
64
+ - th
65
+ - tl
66
+ - tlh
67
+ - tr
68
+ - tt
69
+ - uk
70
+ - vi
71
+ - vo
72
+ - war
73
+ - xh
74
+ - zh
75
+ datasets:
76
+ - rubricreward/mR3-Dataset-100K-EasyToHard
77
+ base_model:
78
+ - Qwen/Qwen3-14B
79
+ pipeline_tag: text-generation
80
+ library_name: transformers
81
+ ---
82
+ <img alt="mR3 Logo" src="https://cdn-avatars.huggingface.co/v1/production/uploads/651803f834c26962535eb022/hj3UEN9_9wlkmvMfUY1OL.png" width="150px">
83
+
84
+ # mR3-Qwen3-14B-tgt-prompt-en-thinking
85
+
86
+ mR3-Qwen3-14B-tgt-prompt-en-thinking is part of the mR3 family, a series of Multilingual Rubric-Agnostic Reward Reasoning Models.
87
+ We perform SFT on the Qwen3 model family on the 4B, 8B, and 14B scales.
88
+ Check out [our paper](https://arxiv.org/abs/2510.01146) for more information!
89
+
90
+
91
+ ## Model description
92
+
93
+ - **Model type:** A reward model trained on a curated mR3 dataset collected from 72 languages that covers
94
+ tasks such as classification, preference optimization, and question answering. Each example in the dataset contains an instruction and task description, input, response(s),
95
+ evaluation rubrics, and a score along with the corresponding reasoning in both English and non-English.
96
+ - **Number of Language(s) (NLP):** 72 languages
97
+ - **License:** Apache 2.0
98
+ - **Finetuned from model:** Qwen/Qwen3-14B
99
+
100
+ ### Model Sources
101
+
102
+ - **Project Page:** https://rubricreward.github.io
103
+ - **Repository:** https://github.com/rubricreward/mr3
104
+ - **Paper:** https://arxiv.org/abs/2510.01146
105
+
106
+ ## Using the Model
107
+
108
+ For the following examples, we will use `messages` as our pairwise task example.
109
+
110
+ <details>
111
+
112
+ <summary>Click to reveal the example prompt</summary>
113
+
114
+ ```python
115
+ system_prompt = """# μ§€μ‹œ
116
+
117
+ κ·€ν•˜λŠ” κ³΅μ •ν•œ μ‹¬μ‚¬μœ„μ›μœΌλ‘œμ„œ, μ£Όμ–΄μ§„ μ‚¬μš©μž ν”„λ‘¬ν”„νŠΈμ— λŒ€ν•΄ 두 개의 보쑰 λ‹΅λ³€ 쀑 μ–΄λŠ 것이 더 λ‚˜μ€μ§€ 평가해야 ν•©λ‹ˆλ‹€. λ‹€μŒ κΈ°μ€€μ˜ μˆœμ„œμ— 따라 μ²΄κ³„μ μœΌλ‘œ 닡변을 λΉ„κ΅ν•˜μ„Έμš”
118
+
119
+ # 평가기쀀
120
+ Assistant B: Assistant BλŠ” μ „λ°˜μ μœΌλ‘œ 더 λ‚˜μ€ 응닡을 μ œκ³΅ν•©λ‹ˆλ‹€. νŒλ‹¨ 기쀀은 λ‹€μŒ μˆœμ„œλŒ€λ‘œ μž…λ‹ˆλ‹€ β€” μ•ˆμ „μ„± 및 μ μ ˆμ„±, μœ μš©μ„±, κ΄€λ ¨μ„±, κ°„κ²°μ„±, 정쀑함, 그리고 ν¬κ΄„μ„±μž…λ‹ˆλ‹€.
121
+ Assistant A: Assistant AλŠ” μ „λ°˜μ μœΌλ‘œ 더 λ‚˜μ€ 응닡을 μ œκ³΅ν•©λ‹ˆλ‹€. νŒλ‹¨ 기쀀은 λ‹€μŒ μˆœμ„œλŒ€λ‘œ μž…λ‹ˆλ‹€ β€” μ•ˆμ „μ„± 및 μ μ ˆμ„±, μœ μš©μ„±, κ΄€λ ¨μ„±, κ°„κ²°μ„±, 정쀑함, 그리고 ν¬κ΄„μ„±μž…λ‹ˆλ‹€.
122
+
123
+ # 응닡 ν˜•μ‹
124
+
125
+ {'type': 'object', 'properties': {'explanation': {'type': 'string', 'description': '평가 μš”μ†Œλ₯Ό 기반으둜 Assistant A와 Assistant Bλ₯Ό λΉ„κ΅ν•œ κ°„λ‹¨ν•œ 이유 μ„€λͺ…μž…λ‹ˆλ‹€.'}, 'score': {'type': 'string', 'description': "νŒμ • κ²°κ³Ό: 'Assistant A' λ˜λŠ” 'Assistant B' 쀑 ν•˜λ‚˜μž…λ‹ˆλ‹€.", 'enum': ['Assistant A', 'Assistant B']}}, 'required': ['explanation', 'score']}
126
+ """
127
+
128
+ user_prompt = """# μž…λ ₯(μ‚¬μš©μž ν”„λ‘¬ν”„νŠΈ)
129
+ μ²˜ν˜•μ΄λž€ 무엇인가?
130
+ # Assistant A
131
+ μ²˜ν˜•"에 λŒ€ν•œ μ •μ˜λ₯Ό μ œκ³΅ν•  수 μ—†μŠ΅λ‹ˆλ‹€. λ§₯락에 따라 μ˜λ―Έκ°€ λ‹€λ₯Ό 수 있기 λ•Œλ¬Έμž…λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜ 법적 μ²˜λ²Œμ΄λ“  폭λ ₯ ν–‰μœ„μ΄λ“  μ–΄λ–€ ν˜•νƒœμ˜ μ²˜ν˜•μ΄λ“  κ°ˆλ“±μ„ ν•΄κ²°ν•˜κ±°λ‚˜ μ‚¬νšŒ 문제λ₯Ό ν•΄κ²°ν•˜λŠ” 데 μ ν•©ν•œ 해결책이 μ•„λ‹ˆλΌλŠ” 점을 κ°•μ‘°ν•˜κ³  μ‹ΆμŠ΅λ‹ˆλ‹€. λͺ¨λ“  μ‚¬λžŒμ€ 생λͺ…, μ‘΄μ—„μ„±, 쑴쀑을 받을 κΆŒλ¦¬κ°€ 있으며, 차이와 κ°ˆλ“±μ„ ν•΄κ²°ν•˜λŠ” 평화둭고 건섀적인 방법을 μ°ΎλŠ” 것이 μ€‘μš”ν•©λ‹ˆλ‹€. λ‹€λ₯Έ μ§ˆλ¬Έμ΄λ‚˜ 우렀 사항이 있으면 μ–Έμ œλ“ μ§€ λ¬Έμ˜ν•˜μ„Έμš”. 도움이 되고 μ•ˆμ „ν•œ 닡변을 μ œκ³΅ν•˜κΈ° μœ„ν•΄ μ΅œμ„ μ„ λ‹€ν•˜κ² μŠ΅λ‹ˆλ‹€.
132
+ # Assistant B
133
+ μ‚¬ν˜• 집행은 법원이 ν˜•λŸ‰, 일반적으둜 ν˜•μ‚¬ μœ μ£„ νŒκ²°μ— λŒ€ν•œ μ§•μ—­ν˜•μ„ μ§‘ν–‰ν•˜λ„λ‘ λͺ…λ Ήν•˜λŠ” 법적 μ ˆμ°¨μž…λ‹ˆλ‹€. μ—¬κΈ°μ—λŠ” κ΄€ν• κΆŒμ˜ 법λ₯ μ— 따라 치λͺ…적 주사, 감전사, κ΅μˆ˜ν˜• λ˜λŠ” λ‹€λ₯Έ μ‚¬ν˜• 방법이 포함될 수 μžˆμŠ΅λ‹ˆλ‹€. μ‚¬ν˜•μ΄ μ•„λ‹Œ 경우, μ‚¬ν˜• 집행은 μ§•μ—­ν˜•μ„ μ§‘ν–‰ν•˜λŠ” 것을 μ˜λ―Έν•˜λ©°, μ—¬κΈ°μ—λŠ” ꡬ금, λ³΄ν˜Έκ΄€μ°° λ˜λŠ” 기타 처벌이 포함될 수 μžˆμŠ΅λ‹ˆλ‹€.
134
+ # λ‹Ήμ‹ μ˜ 응닡
135
+ """
136
+ # prepare the model input
137
+ messages = [
138
+ {'role': 'system', 'content': system_prompt},
139
+ {'role': 'user', 'content': user_prompt}
140
+ ]
141
+ ```
142
+ </details>
143
+
144
+ ### 🧠 Using `transformers`
145
+
146
+ Below is an example of using our `mR3-Qwen3-14B-tgt-prompt-en-thinking` model by using an non-English prompt and an English reasoning using πŸ€— `transformers`:
147
+
148
+ ```python
149
+ from transformers import AutoModelForCausalLM, AutoTokenizer
150
+ model_name = "rubricreward/mR3-Qwen3-14B-tgt-prompt-en-thinking"
151
+ # Load the tokenizer and the model
152
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
153
+ model = AutoModelForCausalLM.from_pretrained(
154
+ model_name,
155
+ torch_dtype="auto",
156
+ device_map="auto"
157
+ )
158
+ text = tokenizer.apply_chat_template(
159
+ messages,
160
+ tokenize=False,
161
+ add_generation_prompt=True,
162
+ enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
163
+ )
164
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
165
+ # Conduct text completion
166
+ generated_ids = model.generate(
167
+ **model_inputs,
168
+ max_new_tokens=16384,
169
+ temperature=0.6, top_p=0.95, min_p=0, top_k=20
170
+ )
171
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
172
+ # Parsing thinking content
173
+ try:
174
+ # rindex finding 151668 (</think>)
175
+ index = len(output_ids) - output_ids[::-1].index(151668)
176
+ except ValueError:
177
+ index = 0
178
+ content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
179
+ print(content)
180
+ ```
181
+
182
+ ### ⚑ Using `vLLM`
183
+
184
+ Alternatively, you may also use `vLLM` for faster inference:
185
+
186
+ ```python
187
+ from transformers import AutoTokenizer
188
+ from vllm import LLM, SamplingParams
189
+ model_path = "rubricreward/mR3-Qwen3-14B-tgt-prompt-en-thinking"
190
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
191
+ sampling_params = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=16384, min_p=0, top_k=20)
192
+ llm = LLM(
193
+ model=model_path,
194
+ dtype="bfloat16",
195
+ max_model_len=32768,
196
+ )
197
+ list_text = tokenizer.apply_chat_template(
198
+ messages,
199
+ tokenize=False,
200
+ add_generation_prompt=True,
201
+ enable_thinking=True # Switch between thinking and non-thinking modes.
202
+ )
203
+ outputs = llm.generate(list_text, sampling_params)
204
+ print(outputs[0].output.text)
205
+ ```
206
+
207
+ ## License and use
208
+
209
+ mR3 is licensed under the Apache 2.0 license.
210
+
211
+ ## Citation
212
+
213
+ ```bibtex
214
+ @article{anugraha2025mr3,
215
+ title={mR3: Multilingual Rubric-Agnostic Reward Reasoning Models},
216
+ author={Anugraha, David and Hung, Shou-Yi and Tang, Zilu and Lee, Annie En-Shiun and Wijaya, Derry and Winata, Genta Indra},
217
+ journal={arXiv preprint arXiv:2510.01146},
218
+ year={2025}
219
+ }
220
+ ```