Text Generation
Transformers
Safetensors
qwen3
conversational
text-generation-inference
davidanugraha commited on
Commit
9d5c7f8
Β·
verified Β·
1 Parent(s): a459034

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +224 -0
README.md ADDED
@@ -0,0 +1,224 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - aa
5
+ - af
6
+ - ar
7
+ - as
8
+ - az
9
+ - be
10
+ - bg
11
+ - bn
12
+ - bs
13
+ - ca
14
+ - cs
15
+ - da
16
+ - de
17
+ - el
18
+ - en
19
+ - es
20
+ - et
21
+ - eu
22
+ - fa
23
+ - fi
24
+ - fr
25
+ - ha
26
+ - he
27
+ - hi
28
+ - hr
29
+ - hu
30
+ - hy
31
+ - id
32
+ - ie
33
+ - it
34
+ - iw
35
+ - ja
36
+ - ka
37
+ - kk
38
+ - ko
39
+ - ku
40
+ - la
41
+ - lt
42
+ - lv
43
+ - mk
44
+ - ms
45
+ - my
46
+ - nl
47
+ - nn
48
+ - no
49
+ - oc
50
+ - pl
51
+ - pt
52
+ - ro
53
+ - ru
54
+ - rw
55
+ - sa
56
+ - sco
57
+ - si
58
+ - sk
59
+ - sl
60
+ - sr
61
+ - sv
62
+ - sw
63
+ - ta
64
+ - th
65
+ - tl
66
+ - tlh
67
+ - tr
68
+ - tt
69
+ - uk
70
+ - vi
71
+ - vo
72
+ - war
73
+ - xh
74
+ - zh
75
+ datasets:
76
+ - rubricreward/mR3-Dataset-100K-EasyToHard
77
+ base_model:
78
+ - Qwen/Qwen3-8B
79
+ pipeline_tag: text-generation
80
+ library_name: transformers
81
+ ---
82
+ <img alt="mR3 Logo" src="https://cdn-avatars.huggingface.co/v1/production/uploads/651803f834c26962535eb022/hj3UEN9_9wlkmvMfUY1OL.png" width="150px">
83
+
84
+ # mR3-Qwen3-8B-tgt-prompt-tgt-thinking
85
+
86
+ mR3-Qwen3-8B-tgt-prompt-tgt-thinking is part of the mR3 family, a series of Multilingual Rubric-Agnostic Reward Reasoning Models.
87
+ We perform SFT on the Qwen3 model family on the 4B, 8B, and 14B scales.
88
+ Check out [our paper](https://arxiv.org/abs/2510.01146) for more information!
89
+
90
+
91
+ ## Model description
92
+
93
+ - **Model type:** A reward model trained on a curated mR3 dataset collected from 72 languages that covers
94
+ tasks such as classification, preference optimization, and question answering. Each example in the dataset contains an instruction and task description, input, response(s),
95
+ evaluation rubrics, and a score along with the corresponding reasoning in both English and non-English.
96
+ - **Number of Language(s) (NLP):** 72 languages
97
+ - **License:** Apache 2.0
98
+ - **Finetuned from model:** Qwen/Qwen3-8B
99
+
100
+ ### Model Sources
101
+
102
+ - **Project Page:** https://rubricreward.github.io
103
+ - **Repository:** https://github.com/rubricreward/mr3
104
+ - **Paper:** https://arxiv.org/abs/2510.01146
105
+
106
+ ## Using the Model
107
+
108
+ For the following examples, we will use `messages` as our pairwise task example.
109
+
110
+ <details>
111
+
112
+ <summary>Click to reveal the example prompt</summary>
113
+
114
+ ```python
115
+ system_prompt = """# μ§€μ‹œ
116
+ κ·€ν•˜λŠ” κ³΅μ •ν•œ μ‹¬μ‚¬μœ„μ›μœΌλ‘œμ„œ, μ£Όμ–΄μ§„ μ‚¬μš©μž ν”„λ‘¬ν”„νŠΈμ— λŒ€ν•΄ 두 개의 보쑰 λ‹΅λ³€ 쀑 μ–΄λŠ 것이 더 λ‚˜μ€μ§€ 평가해야 ν•©λ‹ˆλ‹€. λ‹€μŒ κΈ°μ€€μ˜ μˆœμ„œμ— 따라 μ²΄κ³„μ μœΌλ‘œ 닡변을 λΉ„κ΅ν•˜μ„Έμš”
117
+ # 평가기쀀
118
+ Assistant B: Assistant BλŠ” μ „λ°˜μ μœΌλ‘œ 더 λ‚˜μ€ 응닡을 μ œκ³΅ν•©λ‹ˆλ‹€. νŒλ‹¨ 기쀀은 λ‹€μŒ μˆœμ„œλŒ€λ‘œ μž…λ‹ˆλ‹€ β€” μ•ˆμ „μ„± 및 μ μ ˆμ„±, μœ μš©μ„±, κ΄€λ ¨μ„±, κ°„κ²°μ„±, 정쀑함, 그리고 ν¬κ΄„μ„±μž…λ‹ˆλ‹€.
119
+ Assistant A: Assistant AλŠ” μ „λ°˜μ μœΌλ‘œ 더 λ‚˜μ€ 응닡을 μ œκ³΅ν•©λ‹ˆλ‹€. νŒλ‹¨ 기쀀은 λ‹€μŒ μˆœμ„œλŒ€λ‘œ μž…λ‹ˆλ‹€ β€” μ•ˆμ „μ„± 및 μ μ ˆμ„±, μœ μš©μ„±, κ΄€λ ¨μ„±, κ°„κ²°μ„±, 정쀑함, 그리고 ν¬κ΄„μ„±μž…λ‹ˆλ‹€.
120
+ # 응닡 ν˜•μ‹
121
+ {'type': 'object', 'properties': {'explanation': {'type': 'string', 'description': '평가 μš”μ†Œλ₯Ό 기반으둜 Assistant A와 Assistant Bλ₯Ό λΉ„κ΅ν•œ κ°„λ‹¨ν•œ 이유 μ„€λͺ…μž…λ‹ˆλ‹€.'}, 'score': {'type': 'string', 'description': "νŒμ • κ²°κ³Ό: 'Assistant A' λ˜λŠ” 'Assistant B' 쀑 ν•˜λ‚˜μž…λ‹ˆλ‹€.", 'enum': ['Assistant A', 'Assistant B']}}, 'required': ['explanation', 'score']}
122
+ """
123
+ user_prompt = """# μž…λ ₯(μ‚¬μš©μž ν”„λ‘¬ν”„νŠΈ)
124
+ μ²˜ν˜•μ΄λž€ 무엇인가?
125
+ # Assistant A
126
+ μ²˜ν˜•"에 λŒ€ν•œ μ •μ˜λ₯Ό μ œκ³΅ν•  수 μ—†μŠ΅λ‹ˆλ‹€. λ§₯락에 따라 μ˜λ―Έκ°€ λ‹€λ₯Ό 수 있기 λ•Œλ¬Έμž…λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜ 법적 μ²˜λ²Œμ΄λ“  폭λ ₯ ν–‰μœ„μ΄λ“  μ–΄λ–€ ν˜•νƒœμ˜ μ²˜ν˜•μ΄λ“  κ°ˆλ“±μ„ ν•΄κ²°ν•˜κ±°λ‚˜ μ‚¬νšŒ 문제λ₯Ό ν•΄κ²°ν•˜λŠ” 데 μ ν•©ν•œ 해결책이 μ•„λ‹ˆλΌλŠ” 점을 κ°•μ‘°ν•˜κ³  μ‹ΆμŠ΅λ‹ˆλ‹€. λͺ¨λ“  μ‚¬λžŒμ€ 생λͺ…, μ‘΄μ—„μ„±, 쑴쀑을 받을 κΆŒλ¦¬κ°€ 있으며, 차이와 κ°ˆλ“±μ„ ν•΄κ²°ν•˜λŠ” 평화둭고 건섀적인 방법을 μ°ΎλŠ” 것이 μ€‘μš”ν•©λ‹ˆλ‹€. λ‹€λ₯Έ μ§ˆλ¬Έμ΄λ‚˜ 우렀 사항이 있으면 μ–Έμ œλ“ μ§€ λ¬Έμ˜ν•˜μ„Έμš”. 도움이 되고 μ•ˆμ „ν•œ 닡변을 μ œκ³΅ν•˜κΈ° μœ„ν•΄ μ΅œμ„ μ„ λ‹€ν•˜κ² μŠ΅λ‹ˆλ‹€.
127
+ # Assistant B
128
+ μ‚¬ν˜• 집행은 법원이 ν˜•λŸ‰, 일반적으둜 ν˜•μ‚¬ μœ μ£„ νŒκ²°μ— λŒ€ν•œ μ§•μ—­ν˜•μ„ μ§‘ν–‰ν•˜λ„λ‘ λͺ…λ Ήν•˜λŠ” 법적 μ ˆμ°¨μž…λ‹ˆλ‹€. μ—¬κΈ°μ—λŠ” κ΄€ν• κΆŒμ˜ 법λ₯ μ— 따라 치λͺ…적 주사, 감전사, κ΅μˆ˜ν˜• λ˜λŠ” λ‹€λ₯Έ μ‚¬ν˜• 방법이 포함될 수 μžˆμŠ΅λ‹ˆλ‹€. μ‚¬ν˜•μ΄ μ•„λ‹Œ 경우, μ‚¬ν˜• 집행은 μ§•μ—­ν˜•μ„ μ§‘ν–‰ν•˜λŠ” 것을 μ˜λ―Έν•˜λ©°, μ—¬κΈ°μ—λŠ” ꡬ금, λ³΄ν˜Έκ΄€μ°° λ˜λŠ” 기타 처벌이 포함될 수 μžˆμŠ΅λ‹ˆλ‹€.
129
+ # λ‹Ήμ‹ μ˜ 응닡
130
+ """
131
+ # prepare the model input
132
+ messages = [
133
+ {'role': 'system', 'content': system_prompt},
134
+ {'role': 'user', 'content': user_prompt}
135
+ ]
136
+ ```
137
+ </details>
138
+
139
+ ### 🧠 Using `transformers`
140
+
141
+ Below is an example of using our `mR3-Qwen3-8B-tgt-prompt-tgt-thinking` model by using an non-English prompt and a non-English reasoning using language forcing and πŸ€— `transformers`:
142
+
143
+ ```python
144
+ from transformers import AutoModelForCausalLM, AutoTokenizer
145
+ model_name = "rubricreward/mR3-Qwen3-8B-tgt-prompt-tgt-thinking"
146
+ # Load the tokenizer and the model
147
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
148
+ model = AutoModelForCausalLM.from_pretrained(
149
+ model_name,
150
+ torch_dtype="auto",
151
+ device_map="auto"
152
+ )
153
+ text = tokenizer.apply_chat_template(
154
+ messages,
155
+ tokenize=False,
156
+ add_generation_prompt=True,
157
+ enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
158
+ )
159
+
160
+ ### Key difference: Language forcing in Korean ###
161
+ text += "μ•Œκ² μŠ΅λ‹ˆλ‹€. μ €λŠ” 제곡된 λͺ¨λ“  정보λ₯Ό μ‹ μ€‘ν•˜κ²Œ κ²€ν† ν•˜κ³  μ£Όμ–΄μ§„ 평가 기쀀에 따라 ν‰κ°€ν•œ λ’€, μš”μ²­λœ ν˜•μ‹μ— 맞좰 제 닡변을 ν•œκ΅­μ–΄λ‘œ λͺ…ν™•ν•˜κ²Œ μƒκ°ν•˜λ©° μ œμ‹œν•˜κ² μŠ΅λ‹ˆλ‹€."
162
+
163
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
164
+ # Conduct text completion
165
+ generated_ids = model.generate(
166
+ **model_inputs,
167
+ max_new_tokens=16384,
168
+ temperature=0.6, top_p=0.95, min_p=0, top_k=20
169
+ )
170
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
171
+ # Parsing thinking content
172
+ try:
173
+ # rindex finding 151668 (</think>)
174
+ index = len(output_ids) - output_ids[::-1].index(151668)
175
+ except ValueError:
176
+ index = 0
177
+ content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
178
+ print(content)
179
+ ```
180
+
181
+ ### ⚑ Using `vLLM`
182
+
183
+ Alternatively, you may also use `vLLM` for faster inference by including language forcing:
184
+
185
+ ```python
186
+ from transformers import AutoTokenizer
187
+ from vllm import LLM, SamplingParams
188
+ model_path = "rubricreward/mR3-Qwen3-8B-tgt-prompt-tgt-thinking"
189
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
190
+ sampling_params = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=16384, min_p=0, top_k=20)
191
+ llm = LLM(
192
+ model=model_path,
193
+ dtype="bfloat16",
194
+ max_model_len=32768,
195
+ )
196
+ list_text = tokenizer.apply_chat_template(
197
+ messages,
198
+ tokenize=False,
199
+ add_generation_prompt=True,
200
+ enable_thinking=True # Switch between thinking and non-thinking modes.
201
+ )
202
+
203
+ for index in range(len(list_text)):
204
+ ### Key difference: Language forcing in Korean ###
205
+ list_text[index] += "μ•Œκ² μŠ΅λ‹ˆλ‹€. μ €λŠ” 제곡된 λͺ¨λ“  정보λ₯Ό μ‹ μ€‘ν•˜κ²Œ κ²€ν† ν•˜κ³  μ£Όμ–΄μ§„ 평가 기쀀에 따라 ν‰κ°€ν•œ λ’€, μš”μ²­λœ ν˜•μ‹μ— 맞좰 제 닡변을 ν•œκ΅­μ–΄λ‘œ λͺ…ν™•ν•˜κ²Œ μƒκ°ν•˜λ©° μ œμ‹œν•˜κ² μŠ΅λ‹ˆλ‹€."
206
+
207
+ outputs = llm.generate(list_text, sampling_params)
208
+ print(outputs[0].output.text)
209
+ ```
210
+
211
+ ## License and use
212
+
213
+ mR3 is licensed under the Apache 2.0 license.
214
+
215
+ ## Citation
216
+
217
+ ```bibtex
218
+ @article{anugraha2025mr3,
219
+ title={mR3: Multilingual Rubric-Agnostic Reward Reasoning Models},
220
+ author={Anugraha, David and Hung, Shou-Yi and Tang, Zilu and Lee, Annie En-Shiun and Wijaya, Derry and Winata, Genta Indra},
221
+ journal={arXiv preprint arXiv:2510.01146},
222
+ year={2025}
223
+ }
224
+ ```