Qwen/QwQ-32B · missing opening <think>

chriswritescode

Mar 5

anyone else experience this ?

baohao

Mar 5

•

edited Mar 5

In README:

Ensure the model starts with "<think> \n" to prevent generating empty thinking content, which can degrade output quality. If you use apply_chat_template and set add_generation_prompt=True, this is already automatically implemented, but it may cause the response to lack the <think> tag at the beginning. This is normal behavior.

chriswritescode

Mar 5

Thank you, missed it. Too excited to try it :)

chriswritescode changed discussion status to closed Mar 5

chriswritescode

Mar 5

How can you enable that with something like sglang or vllm

chriswritescode changed discussion status to open Mar 5

dr-e

Mar 5

Even if it says so in the readme, lacking the opening tag breaks lots of integrations like OpenWebUI, Continue.dev, etc. This is not functional as it is :(

owao

Mar 5

•

edited Mar 5

Even if it says so in the readme, lacking the opening tag breaks lots of integrations like OpenWebUI, Continue.dev, etc. This is not functional as it is :(

It is a behavior you fix at inference. Once you set the correct prompt template, the generation will always output the starting tag as intended, whether the client is OWUI, Continue, Aider etc.

If you use a quant with Ollama, maybe just ~~try the R1 template as it was a similar case~~. EDIT, it wasn't, their template didn't add it.

You just need the template to add the think tag as in the given prompt template https://huggingface.co/Qwen/QwQ-32B/blob/main/tokenizer_config.json. You'll see the tag at the very end {{- '<|im_start|>assistant\\n<think>\\n' }}\n{%- endif %}\n

chriswritescode

Mar 5

•

edited Mar 5

I am using sglang to serve this for tensor parallel. how can I apply the template ?

owao

Mar 5

I am using sglang to serve this for tensor parallel. how can I apply the template ?

I guess translating template isn't very hard for LLMs ;) so if you have a template example, just show it the source and your target example, great are the chances it will do it correctly ;)

owao

Mar 5

With ollama, using the suggested template (it does not add the tag as suggested in the README), I personally get the at the beginning when trying short prompts like "How to cook a fried egg to perfection?" 15/15 trials. I guess the unwanted behavior might occur on longer input context?

owao

Mar 5

•

edited Mar 5

Ok guys, I think you should forget everything I said as I'm getting confused with the ollama template...
When using their suggested template from https://ollama.com/library/qwq/blobs/41190096a061:

{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ end }}
{{- end }}

The opening tag is output as intended.

But when I try to force it as per the original prompt from tokenizer_config.json:

{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
<think>
{{ end }}
{{- end }}

It isn't! I don't get what's going on.

So if anybody understands the behavior, please share!

And sorry for misleading to everyone.

chriswritescode

Mar 6

I got it working properly with sglang , I build the latest build from git source. I added the --reasoning-parser deepseek-r1 and now in openwebui and else where I get the thinking enclosed

xldistance

Mar 6

In README:

Ensure the model starts with "<think> \n" to prevent generating empty thinking content, which can degrade output quality. If you use apply_chat_template and set add_generation_prompt=True, this is already automatically implemented, but it may cause the response to lack the <think> tag at the beginning. This is normal behavior.

Updated tokenizer_config, model replies still missing <think> \n start tags

dr-e

Mar 6

I got it working properly with sglang , I build the latest build from git source. I added the --reasoning-parser deepseek-r1 and now in openwebui and else where I get the thinking enclosed

This works also for vLLM. Thank you.

carryyyy

Mar 7

Removing the last part "\n think in '<|im_start|>assistant\n think\n' " in the "chat_template" can solve this problem.

owao

Mar 7

https://unsloth.ai/blog/qwq-32b RELEVANT

meetzuber

Mar 8

I got it working properly with sglang , I build the latest build from git source. I added the --reasoning-parser deepseek-r1 and now in openwebui and else where I get the thinking enclosed

After adding these flags model accuracy decreased. without adding these flags model can answer "how many r's in word strawberrry?" correctly. but after adding these flags it is unable to answer. it gives 3 R's but correct answer is 4 as there is an extra R. I am using FP16 model.

owao

Mar 8

I got it working properly with sglang , I build the latest build from git source. I added the --reasoning-parser deepseek-r1 and now in openwebui and else where I get the thinking enclosed

After adding these flags model accuracy decreased. without adding these flags model can answer "how many r's in word strawberrry?" correctly. but after adding these flags it is unable to answer. it gives 3 R's but correct answer is 4 as there is an extra R. I am using FP16 model.

Did you set repeat penalty superior to 1? As it seems to decrease quality messing up with the sampling, as per unsloth's findings

SongXiaoMao

Mar 9

I got it working properly with sglang , I build the latest build from git source. I added the --reasoning-parser deepseek-r1 and now in openwebui and else where I get the thinking enclosed

This works also for vLLM. Thank you.

非常感谢已经好用了

cqwyj2000

Mar 12

Removing the last part "\n think in '<|im_start|>assistant\n think\n' " in the "chat_template" can solve this problem.

这个是靠谱的，我用的vllm，试过了可以

hana90

Apr 25

Removing the last part "\n think in '<|im_start|>assistant\n think\n' " in the "chat_template" can solve this problem.

what is the modified chat template please? I don't need to use the reasoning parser in VLLM, so i need t get this start tag.
I start the model as:
VLLM_USE_V1=0 vllm serve Qwen/QwQ-32B --max-model-len 32000 --gpu-memory-utilization 0.95 --distributed-executor-backend mp --tensor-parallel-size 4

carryyyy

Apr 25

Removing the last part "\n think in '<|im_start|>assistant\n think\n' " in the "chat_template" can solve this problem.

what is the modified chat template please? I don't need to use the reasoning parser in VLLM, so i need t get this start tag.
I start the model as:
VLLM_USE_V1=0 vllm serve Qwen/QwQ-32B --max-model-len 32000 --gpu-memory-utilization 0.95 --distributed-executor-backend mp --tensor-parallel-size 4

modify "chat_template" in the tokenizer_config.json
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- '' }}\n {%- endif %}\n {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within XML tags:\n" }}\n {%- for tool in tools %}\n {{- "\n" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- "\n\n\nFor each function call, return a json object with function name and arguments within XML tags:\n\n{\"name\": , \"arguments\": }\n<|im_end|>\n" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}\n {%- elif message.role == "assistant" and not message.tool_calls %}\n {%- set content = message.content.split('')[-1].lstrip('\n') %}\n {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}\n {%- elif message.role == "assistant" %}\n {%- set content = message.content.split('')[-1].lstrip('\n') %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\n' + content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\n\n{"name": "' }}\n {{- tool_call.name }}\n {{- '", "arguments": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\n' }}\n {%- endfor %}\n {{- '<|im_end|>\n' }}\n {%- elif message.role == "tool" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\n\n' }}\n {{- message.content }}\n {{- '\n' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}\n {{- '<|im_end|>\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\n' }}\n{%- endif %}\n",