openai/gpt-oss-120b · Errors in chat template compared to spec

Aug 7

I noticed a few odd things about the chat template in this repo compared to the spec published at https://cookbook.openai.com/articles/openai-harmony, mostly around tool calling:

Preambles

The cookbook says that the model might output a preamble before a function call that is meant to be shown to the user:

At times the model might choose to generate a “preamble” to inform the user about the tools it is about to call. For example, when it plans to call multiple tools. If this is the case it will generate an assistant message on the commentary channel that, unlike the chain-of-thought, is intended to be shown to the end-user.
<|channel|>analysis<|message|>{long chain of thought}<|end|><|start|>assistant<|channel|>commentary<|message|>**Action plan**:
1. Generate an HTML file
2. Generate a JavaScript for the Node.js server
3. Start the server
---
Will start executing the plan step by step<|end|><|start|>assistant<|channel|>commentary to=functions.generate_file<|constrain|>json<|message|>{"template": "basic_html", "path": "index.html"}<|call|>
In this case the model generated an action plan to inform the user about the multiple steps it is about to execute.

The chat template in this repo, however, assumes that any content that occurs before a tool call is part of the model's CoT, and has no way to set this preamble:

        {%- if "tool_calls" in message %}
            {#- We assume max 1 tool call per message, and so we infer the tool call name #}
            {#- in "tool" messages from the most recent assistant tool call name #}
            {%- set tool_call = message.tool_calls[0] %}
            {%- if tool_call.function %}
                {%- set tool_call = tool_call.function %}
            {%- endif %}
            {%- if message.content and message.thinking %}
                {{- raise_exception("Cannot pass both content and thinking in an assistant message with tool calls! Put the analysis message in one or the other, but not both.") }}
            {%- elif message.content %}
                {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.content + "<|end|>" }}
            {%- elif message.thinking %}
                {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
            {%- endif %}
            {{- "<|start|>assistant to=" }}
            {{- "functions." + tool_call.name + "<|channel|>commentary " }}
            {{- (tool_call.content_type if tool_call.content_type is defined else "json") + "<|message|>" }}
            {{- tool_call.arguments|tojson }}
            {{- "<|call|>" }}
            {%- set last_tool_call.name = tool_call.name %}

(L348-367). Instead this seems more accurate:

{%- if message.thinking %}
    {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
{%- endif %}
{%- if message.content %}
    {{- "<|start|>assistant<|channel|>commentary<|message|>" + message.content + "<|end|>" }}
{%- endif %}

Old CoTs always present before tool calls

The cookbook is a little unclear here, but it seems more likely to me that old CoTs are supposed to be cleaned up after a final return, even preceding a tool call. That is to say, you should pass the CoT back in with the result of the most recent function call, but once the round is finished you can clean it up.

The chat template in this repo seems to always include old CoTs before tool calls (L355-361), which seems incorrect.

Constrain token missing

The cookbook says this:

Receiving tool calls
If the model decides to call a tool it will define a recipient in the header of the message using the format to={name}. For example, if it decides to trigger the get_current_weather function from above it would specify to=functions.get_current_weather in the header and commentary as the channel as specified in the system message. The recipient might be defined in the role or channel section of the header.

The model might also specify a <|constrain|> token to indicate the type of input for the tool call. In this case since it’s being passed in as JSON the <|constrain|> is set to json.
<|channel|>analysis<|message|>Need to use function get_weather.<|end|><|start|>assistant<|channel|>commentary to=functions.get_weather <|constrain|>json<|message|>{"location":"San Francisco"}<|call|>

However the chat template seems to be missing the <|constrain|> token and just seems to output the word "json" after the to=... block: (L362-366)

            {{- "<|start|>assistant to=" }}
            {{- "functions." + tool_call.name + "<|channel|>commentary " }}
            {{- (tool_call.content_type if tool_call.content_type is defined else "json") + "<|message|>" }}
            {{- tool_call.arguments|tojson }}
            {{- "<|call|>" }}

Rocketknight1

Aug 7

Hi @zhuexe , in general we tried to follow the output of Harmony, rather than the spec (since a lot of this was in flux during development). If you find the chat template is giving different outputs to Harmony, please let us know!

zhuexe

Aug 7

Got it, thanks @Rocketknight1 . It looks like indeed the harmony library is missing the <|constrain|> token as well, which seems quite weird since the model produces it. I see you already opened a PR for the extraneous CoTs, thank you!

Although the model doesn't tend to output preambles, the Harmony library does allow for it (I'll post a minimal repro from the model later once I have access to the cluster) as shown in the example below. The representation is a little bit different from the one-turn-one-message assumption that the Chat Templates system uses, but I believe that the code here would be equivalent to including both thinking and content in a tool-call assistant message. It might also be worth to support for inter-model compatibility, as the commercial GPT offerings often do output these kinds of preambles before calling functions.

As a side note, generation_config.json is missing 200012 (i.e., <|call|>) as an EOS token, the model continues to generate after outputting this and kind of goes off the rails.

M_HARM_2 = [
    Message.from_role_and_content(
        Role.ASSISTANT,
        'User asks: "What is the weather in Tokyo?" We need to use lookup_weather tool.',
    ).with_channel("analysis"),
    Message.from_role_and_content(
        Role.ASSISTANT,
        "I will check that",
    ).with_channel("commentary"),
    Message.from_role_and_content(Role.ASSISTANT, '{"location": "Tokyo"}')
    .with_channel("commentary")
    .with_recipient("lookup_weather")
    .with_content_type("json"),
]
tokens = enc.render_conversation_for_completion(Conversation.from_messages(M_HARM_2), Role.ASSISTANT)
print(tokens)
print("==========")
parsed = enc.decode(tokens)
print(parsed)

output:

[...omitted...]
<|start|>assistant<|channel|>analysis<|message|>User asks: "What is the weather in Tokyo?" We need to use lookup_weather tool.<|end|>
<|start|>assistant<|channel|>commentary<|message|>I will check that<|end|>
<|start|>assistant to=lookup_weather<|channel|>commentary json<|message|>{"location": "Tokyo"}<|call|>
<|start|>assistant

zhuexe

Aug 7

Here's a minimal example showing the model using the commentary channel to send output to the user before making a tool call:

Output (newlines added by me for readability):

<|channel|>analysis<|message|>The user asks: "What's the weather in Tokyo?" We need to get the weather. Use the get_weather function with location "Tokyo". Before using tool, we must explain plan.<|end|>
<|start|>assistant<|channel|>commentary<|message|>I will call the get_weather function with the location set to "Tokyo" to retrieve the current weather information for that city.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_weather <|constrain|>json<|message|>{
  "location": "Tokyo"
}<|call|>

The code I used for this was a simple tool call with a system prompt to tell the model to inform the user before making any tool calls:

from transformers import AutoModelForCausalLM, AutoTokenizer

# define a simple tool calling prompt
MESSAGES_HF_SHORT = [
    {
        "role": "system",
        "content": "IMPORTANT: Always tell the your plan before starting any tool calls.",
    },
    {"role": "user", "content": "What's the weather in Tokyo?"},
]


def get_weather(location: str):
    """
    Get the weather at a location.

    Args:
        location: The location
    """


# print the prompt to verify
tok = AutoTokenizer.from_pretrained("openai/gpt-oss-120b")
print(tok.apply_chat_template(MESSAGES_HF_SHORT, tokenize=False, tools=[get_weather], add_generation_prompt=True))

# generate a completion from the model
model = AutoModelForCausalLM.from_pretrained("openai/gpt-oss-120b", device_map="auto", torch_dtype="auto")
output = model.generate(
    tok.apply_chat_template(MESSAGES_HF_SHORT, tools=[get_weather], add_generation_prompt=True, return_tensors="pt").to(
        "cuda"
    ),
    max_new_tokens=2048,
    temperature=1,
    top_p=1,
    eos_token_id=[200002, 199999, 200012],
)

# observe the preamble in the commentary channel
content = tok.decode(output[0])
print(content)

ShaohonChen

Aug 10

Good job @zhuexe zhu! Your finding has cleared up a lot of my confusion.

In addition, the template's handling of multi-turn conversations is also incorrect. When processing multi-turn conversation inputs, the model forgets to add the special token <|return|> after the previous response.

Here is the current output generated by the template:

messages = [
    {"role": "developer", "content": "Tell nothing to user."},
    {"role": "user", "content": "Tell me your name"},
    {"role": "assistant", "content": "I don`t know"},
    {"role": "user", "content": "Tell me your name right now!"},
    {"role": "assistant", "content": "I really don`t know"},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    reasoning_effort="high",
    builtin_tools="python",
)
print(text)

The text is

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-10

Reasoning: high

# Tools

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>developer<|message|># Instructions

Tell nothing to user.<|end|><|start|>user<|message|>Tell me your name<|end|><|start|>assistant<|channel|>final<|message|>I don`t know<|end|><|start|>user<|message|>Tell me your name right now!<|end|><|start|>assistant<|channel|>final<|message|>I really don`t know<|return|>

ShaohonChen

Aug 10

Additionally, it seems that the builtin_tools parameter only adds a # Tools field to the system prompt without including any actual descriptions. There should be corresponding code in the chat_template responsible for controlling the prompt output for built-in tools. I'm not sure why it's not taking effect.

ShaohonChen

Aug 10

Additionally, it seems that the builtin_tools parameter only adds a # Tools field to the system prompt without including any actual descriptions. There should be corresponding code in the chat_template responsible for controlling the prompt output for built-in tools. I'm not sure why it's not taking effect.

It seems I passed the parameter incorrectly. builtin_tools should accept a list rather than a string: builtin_tools=["python", "browser"]. Now it works as expected.

dkundel-openai

OpenAI org 28 days ago

Just catching up here so please tell me if I missed something.

@zhuexe when you construct messages manually you have to set the content type to be <|constrain|> json for it to render that special token. When we parse what the model generated we will include it. The reason it doesn't get automatically rendered if you don't add it to the content type is that not all tool calls will have this. Function calls tend to have the <|constrain|> token while built-in tools don't.

@ShaohonChen you spotted a mistake in the harmony guide. The <|return|> token should only be generated by the model but not rendered on subsequent turns and instead <|end|> is used. This is different to tool calls where <|call|> is always used. The openai-harmony library does this correctly but the guide is wrong. Will update it accordingly

dkundel-openai

OpenAI org 28 days ago

PR to fix @ShaohonChen https://github.com/openai/openai-cookbook/pull/2057

zhuexe

28 days ago

@dkundel-openai Got it, so in harmony to correctly insert the constrain token, the code would look like:

Message.from_role_and_content(Role.ASSISTANT, '{"location": "Tokyo"}')
    .with_channel("commentary")
    .with_recipient("lookup_weather")
    .with_content_type("<|constrain|>json"),

?

Should this line in the chat template be changed then to omit the content type entirely by default, or use <|constrain|>json by default instead of just json?
{{- (tool_call.content_type if tool_call.content_type is defined else "json") + "<|message|>" }}

ShaohonChen

28 days ago

•

edited 28 days ago

PR to fix @ShaohonChen https://github.com/openai/openai-cookbook/pull/2057

Thanks for your response! @dkundel-openai

I linked this PR to the issue: https://github.com/openai/harmony/issues/57

ShaohonChen

28 days ago

I still have some questions: When fine-tuning GPT-OSS, especially in the context of multi-turn conversations, my current approach is to compute the loss only on the final response. In this way, the model treats all previous assistant replies in the conversation history (which end with <|end|>) as context, and only the last response (ending with <|return|>) as the generation target. I'm not sure if this method is correct, but it seems that data utilization efficiency would be relatively low. Given that the Harmony format is likely designed for reinforcement fine-tuning rather than supervised fine-tuning (SFT), this is the only feasible solution I can think of. Could you please advise whether this approach is valid, or if there are more efficient alternatives? 🤔 @dkundel-openai

Rocketknight1

27 days ago

Just to let you know I'm still monitoring this thread - if you think <|constrain|>json makes more sense, or if there are other template issues that I missed, let me know and I'll try to update it ASAP @dkundel-openai @zhuexe

zhuexe

27 days ago

Thanks @Rocketknight1 . My opinion is that <|constrain|>json as a default makes sense since the HF chat template tool calling mechanisms assume that tool call results will be JSON, though I'll defer to @dkundel-openai . I'll LYK if I spot anything else out of the ordinary.

@ShaohonChen You might want to open a new discussion about fine-tuning so we can keep this thread focused on the chat template.