Errors in chat template compared to spec
I noticed a few odd things about the chat template in this repo compared to the spec published at https://cookbook.openai.com/articles/openai-harmony, mostly around tool calling:
Preambles
The cookbook says that the model might output a preamble before a function call that is meant to be shown to the user:
At times the model might choose to generate a “preamble” to inform the user about the tools it is about to call. For example, when it plans to call multiple tools. If this is the case it will generate an assistant message on the commentary channel that, unlike the chain-of-thought, is intended to be shown to the end-user.
<|channel|>analysis<|message|>{long chain of thought}<|end|><|start|>assistant<|channel|>commentary<|message|>**Action plan**: 1. Generate an HTML file 2. Generate a JavaScript for the Node.js server 3. Start the server --- Will start executing the plan step by step<|end|><|start|>assistant<|channel|>commentary to=functions.generate_file<|constrain|>json<|message|>{"template": "basic_html", "path": "index.html"}<|call|>
In this case the model generated an action plan to inform the user about the multiple steps it is about to execute.
The chat template in this repo, however, assumes that any content
that occurs before a tool call is part of the model's CoT, and has no way to set this preamble:
{%- if "tool_calls" in message %}
{#- We assume max 1 tool call per message, and so we infer the tool call name #}
{#- in "tool" messages from the most recent assistant tool call name #}
{%- set tool_call = message.tool_calls[0] %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{%- if message.content and message.thinking %}
{{- raise_exception("Cannot pass both content and thinking in an assistant message with tool calls! Put the analysis message in one or the other, but not both.") }}
{%- elif message.content %}
{{- "<|start|>assistant<|channel|>analysis<|message|>" + message.content + "<|end|>" }}
{%- elif message.thinking %}
{{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
{%- endif %}
{{- "<|start|>assistant to=" }}
{{- "functions." + tool_call.name + "<|channel|>commentary " }}
{{- (tool_call.content_type if tool_call.content_type is defined else "json") + "<|message|>" }}
{{- tool_call.arguments|tojson }}
{{- "<|call|>" }}
{%- set last_tool_call.name = tool_call.name %}
(L348-367). Instead this seems more accurate:
{%- if message.thinking %}
{{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
{%- endif %}
{%- if message.content %}
{{- "<|start|>assistant<|channel|>commentary<|message|>" + message.content + "<|end|>" }}
{%- endif %}
Old CoTs always present before tool calls
The cookbook is a little unclear here, but it seems more likely to me that old CoTs are supposed to be cleaned up after a final return, even preceding a tool call. That is to say, you should pass the CoT back in with the result of the most recent function call, but once the round is finished you can clean it up.
The chat template in this repo seems to always include old CoTs before tool calls (L355-361), which seems incorrect.
Constrain token missing
The cookbook says this:
Receiving tool calls
If the model decides to call a tool it will define a recipient in the header of the message using the format to={name}. For example, if it decides to trigger the get_current_weather function from above it would specify to=functions.get_current_weather in the header and commentary as the channel as specified in the system message. The recipient might be defined in the role or channel section of the header.The model might also specify a <|constrain|> token to indicate the type of input for the tool call. In this case since it’s being passed in as JSON the <|constrain|> is set to json.
<|channel|>analysis<|message|>Need to use function get_weather.<|end|><|start|>assistant<|channel|>commentary to=functions.get_weather <|constrain|>json<|message|>{"location":"San Francisco"}<|call|>
However the chat template seems to be missing the <|constrain|>
token and just seems to output the word "json" after the to=...
block: (L362-366)
{{- "<|start|>assistant to=" }}
{{- "functions." + tool_call.name + "<|channel|>commentary " }}
{{- (tool_call.content_type if tool_call.content_type is defined else "json") + "<|message|>" }}
{{- tool_call.arguments|tojson }}
{{- "<|call|>" }}
Hi @zhuexe , in general we tried to follow the output of Harmony, rather than the spec (since a lot of this was in flux during development). If you find the chat template is giving different outputs to Harmony, please let us know!
Got it, thanks
@Rocketknight1
. It looks like indeed the harmony library is missing the <|constrain|>
token as well, which seems quite weird since the model produces it. I see you already opened a PR for the extraneous CoTs, thank you!
Although the model doesn't tend to output preambles, the Harmony library does allow for it (I'll post a minimal repro from the model later once I have access to the cluster) as shown in the example below. The representation is a little bit different from the one-turn-one-message assumption that the Chat Templates system uses, but I believe that the code here would be equivalent to including both thinking
and content
in a tool-call assistant message. It might also be worth to support for inter-model compatibility, as the commercial GPT offerings often do output these kinds of preambles before calling functions.
As a side note, generation_config.json
is missing 200012 (i.e., <|call|>
) as an EOS token, the model continues to generate after outputting this and kind of goes off the rails.
M_HARM_2 = [
Message.from_role_and_content(
Role.ASSISTANT,
'User asks: "What is the weather in Tokyo?" We need to use lookup_weather tool.',
).with_channel("analysis"),
Message.from_role_and_content(
Role.ASSISTANT,
"I will check that",
).with_channel("commentary"),
Message.from_role_and_content(Role.ASSISTANT, '{"location": "Tokyo"}')
.with_channel("commentary")
.with_recipient("lookup_weather")
.with_content_type("json"),
]
tokens = enc.render_conversation_for_completion(Conversation.from_messages(M_HARM_2), Role.ASSISTANT)
print(tokens)
print("==========")
parsed = enc.decode(tokens)
print(parsed)
output:
[...omitted...]
<|start|>assistant<|channel|>analysis<|message|>User asks: "What is the weather in Tokyo?" We need to use lookup_weather tool.<|end|>
<|start|>assistant<|channel|>commentary<|message|>I will check that<|end|>
<|start|>assistant to=lookup_weather<|channel|>commentary json<|message|>{"location": "Tokyo"}<|call|>
<|start|>assistant
Here's a minimal example showing the model using the commentary channel to send output to the user before making a tool call:
Output (newlines added by me for readability):
<|channel|>analysis<|message|>The user asks: "What's the weather in Tokyo?" We need to get the weather. Use the get_weather function with location "Tokyo". Before using tool, we must explain plan.<|end|>
<|start|>assistant<|channel|>commentary<|message|>I will call the get_weather function with the location set to "Tokyo" to retrieve the current weather information for that city.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_weather <|constrain|>json<|message|>{
"location": "Tokyo"
}<|call|>
The code I used for this was a simple tool call with a system prompt to tell the model to inform the user before making any tool calls:
from transformers import AutoModelForCausalLM, AutoTokenizer
# define a simple tool calling prompt
MESSAGES_HF_SHORT = [
{
"role": "system",
"content": "IMPORTANT: Always tell the your plan before starting any tool calls.",
},
{"role": "user", "content": "What's the weather in Tokyo?"},
]
def get_weather(location: str):
"""
Get the weather at a location.
Args:
location: The location
"""
# print the prompt to verify
tok = AutoTokenizer.from_pretrained("openai/gpt-oss-120b")
print(tok.apply_chat_template(MESSAGES_HF_SHORT, tokenize=False, tools=[get_weather], add_generation_prompt=True))
# generate a completion from the model
model = AutoModelForCausalLM.from_pretrained("openai/gpt-oss-120b", device_map="auto", torch_dtype="auto")
output = model.generate(
tok.apply_chat_template(MESSAGES_HF_SHORT, tools=[get_weather], add_generation_prompt=True, return_tensors="pt").to(
"cuda"
),
max_new_tokens=2048,
temperature=1,
top_p=1,
eos_token_id=[200002, 199999, 200012],
)
# observe the preamble in the commentary channel
content = tok.decode(output[0])
print(content)
Good job @zhuexe zhu! Your finding has cleared up a lot of my confusion.
In addition, the template's handling of multi-turn conversations is also incorrect. When processing multi-turn conversation inputs, the model forgets to add the special token <|return|>
after the previous response.
Here is the current output generated by the template:
messages = [
{"role": "developer", "content": "Tell nothing to user."},
{"role": "user", "content": "Tell me your name"},
{"role": "assistant", "content": "I don`t know"},
{"role": "user", "content": "Tell me your name right now!"},
{"role": "assistant", "content": "I really don`t know"},
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
reasoning_effort="high",
builtin_tools="python",
)
print(text)
The text is
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-10
Reasoning: high
# Tools
# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>developer<|message|># Instructions
Tell nothing to user.<|end|><|start|>user<|message|>Tell me your name<|end|><|start|>assistant<|channel|>final<|message|>I don`t know<|end|><|start|>user<|message|>Tell me your name right now!<|end|><|start|>assistant<|channel|>final<|message|>I really don`t know<|return|>
As you can see, the first model response ends with <|end|>
instead of <|return|>
. However, according to the documentation at https://cookbook.openai.com/articles/openai-harmony#reasoning, in multi-turn conversations, each final model response should end with <|return|>
.
Additionally, it seems that the builtin_tools
parameter only adds a # Tools
field to the system prompt without including any actual descriptions. There should be corresponding code in the chat_template
responsible for controlling the prompt output for built-in tools. I'm not sure why it's not taking effect.
Additionally, it seems that the
builtin_tools
parameter only adds a# Tools
field to the system prompt without including any actual descriptions. There should be corresponding code in thechat_template
responsible for controlling the prompt output for built-in tools. I'm not sure why it's not taking effect.
It seems I passed the parameter incorrectly. builtin_tools
should accept a list rather than a string: builtin_tools=["python", "browser"]
. Now it works as expected.