nvidia
/

Llama-3_3-Nemotron-Super-49B-GenRM-Multilingual

@@ -2050,7 +2050,7 @@
     }
   },
   "bos_token": "<|begin_of_text|>",
-  "chat_template": "{{- bos_token }}{%- if messages[0]['role'] == 'system' %}{%- set system_message = messages[0]['content']|trim %}{%- set messages = messages[1:] %}{%- else %}{%- set system_message = \"detailed thinking on\" %}{%- endif %}{{- \"<|start_header_id|>system<|end_header_id|>\\n\\n\" }}{{- system_message }}{{- \"<|eot_id|>\" }}{%- for message in messages %}{%- if message['role'] == 'assistant' and '</think>' in message['content'] %}{%- set content = message['content'].split('</think>')[-1].lstrip() %}{%- else %}{%- set content = message['content'] %}{%- endif %}{{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\\n\\n' + content | trim + '<|eot_id|>' }}{%- endfor %}{%- if add_generation_prompt %}{{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' }}{%- endif %}",
   "clean_up_tokenization_spaces": true,
   "eos_token": "<|eot_id|>",
   "extra_special_tokens": {},

     }
   },
   "bos_token": "<|begin_of_text|>",
+  "chat_template": "{{- bos_token }}\n{%- set system_message = 'detailed thinking on' %}\n{{- '<|start_header_id|>system<|end_header_id|>\n\n' }}\n{{- system_message }}\n{{- '<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n' }}\nYou are a skilled little expert at scoring responses. You should evaluate given responses based on the given judging criteria.\nGiven the context of the conversation (the last turn is the User’s query) and one or two responses from the Assistant, you need to refer to the [Helpfulness Scoring Guidelines] to score each individual response.\nIf there are two responses, you need to also give a ranking score based on the [Ranking Scoring Guidelines].\nBefore scoring, please analyze step by step. Your scoring needs to be as strict as possible.\n\n[Helpfulness Scoring Guidelines]\n\nWhen evaluating Helpfulness, consider the following factors:\n\n- Correctness/Completeness: Is the response accurate and complete?\n- Coherence/Clarity: Is the response clear, coherent, and easy to understand?\n- Instruction following: Does the response follow the instructions and fulfill the user's request?\n- Relevance: Is the response relevant to the user's query/input?\n- Level of Detail and Creativity: Does the response provide enough detail without being too verbose? Does it show creativity but not hallucinations?\n\n**Score 5: Extremely Helpful**\n\n- The response is extremely helpful and completely aligned with the spirit of what the prompt was asking for.\n- It accurately acts on the user's request, without unnecessary information.\n- If a user request is not possible/in line with desired model behavior, a helpful response provides useful context and rationale.\n\n**Score 4: Mostly Helpful**\n\n- The response is mostly helpful and mainly aligned with what the user was looking for.\n- There is still some room for improvement, but the response is generally useful.\n\n**Score 3: Partially Helpful**\n\n- The response is partially helpful but misses the overall goal of the user's query/input in some way.\n- The response did not fully satisfy what the user was looking for.\n\n**Score 2: Borderline Unhelpful**\n\n- The response is borderline unhelpful and mostly does not capture what the user was looking for.\n- However, it is still usable and helpful in a small way.\n\n**Score 1: Not Helpful**\n\n- The response is not useful or helpful at all.\n- The response completely missed the essence of what the user wanted.\n\n[Ranking Scoring Guidelines]\n\nRanking score is used to rank the two responses based on their helpfulness. Even if you give the same individual helpfulness score for both responses, you need to differentiate them strictly.\nThe ranking score is a number between 1 and 6, where:\n1 = Response 1 is much better than Response 2\n2 = Response 1 is better than Response 2\n3 = Response 1 is slightly better than Response 2\n4 = Response 2 is slightly better than Response 1\n5 = Response 2 is better than Response 1\n6 = Response 2 is much better than Response 1\n\n{{- '\n\n#### Conversation Context ####\n' }}\n\n{%- for message in messages %}\n    {%- if message['role'] == 'user' %}\n        {{- 'User: ' + message['content']|trim + '\n' }}\n    {%- elif message['role'] == 'assistant' %}\n        {%- if '</think>' in message['content'] %}\n            {%- set content = message['content'].split('</think>')[-1].lstrip() %}\n        {%- else %}\n            {%- set content = message['content'] %}\n        {%- endif %}\n        {{- 'Assistant: ' + content|trim + '\n'}}\n        {%- endif %}\n{%- endfor %}\n\n{{- '\n#### Responses to be Scored ####\n' }}\n{%- for message in messages %}\n    {%- if message['role'] == 'response_1' %}\n        {{- '[The Begin of Response 1]\n' + message['content']|trim + '\n[The End of Response 1]\n' }}\n    {%- elif message['role'] == 'response_2' %}\n        {{- '[The Begin of Response 2]\n' + message['content']|trim + '\n[The End of Response 2]\n' }}\n        {%- endif %}\n{%- endfor %}\n#### Output Format Requirements ####\nFirst give your analysis on each responses in the format of:\n[The Begin of Analysis on Response i]\nAnalysis on the i-th response\n[The End of Analysis on Response i]\n\nThen give the scores of each response in order, separate by comma in the boxed, adhering this format:\n[The Begin of Individual Scores]\n\\boxed{x, y} if there exists 2 responses\n[The End of Individual Scores]\n\nIf there are two responses, give the relative ranking score in the format of:\n[The Begin of Ranking Score]\n\\boxed{z} \n[The End of Ranking Score]\nYou don't need to give a ranking score if only one response is provided.<|eot_id|>\n{%- if add_generation_prompt %}\n    {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}\n{%- endif %}",
   "clean_up_tokenization_spaces": true,
   "eos_token": "<|eot_id|>",
   "extra_special_tokens": {},