missing opening <think>

#4
by getfit - opened

anyone else experience this ?

In README:

Ensure the model starts with "<think> \n" to prevent generating empty thinking content, which can degrade output quality. If you use apply_chat_template and set add_generation_prompt=True, this is already automatically implemented, but it may cause the response to lack the <think> tag at the beginning. This is normal behavior.

Thank you, missed it. Too excited to try it :)

getfit changed discussion status to closed

How can you enable that with something like sglang or vllm

getfit changed discussion status to open

Even if it says so in the readme, lacking the opening tag breaks lots of integrations like OpenWebUI, Continue.dev, etc. This is not functional as it is :(

Even if it says so in the readme, lacking the opening tag breaks lots of integrations like OpenWebUI, Continue.dev, etc. This is not functional as it is :(

It is a behavior you fix at inference. Once you set the correct prompt template, the generation will always output the starting tag as intended, whether the client is OWUI, Continue, Aider etc.

If you use a quant with Ollama, maybe just try the R1 template as it was a similar case. EDIT, it wasn't, their template didn't add it.

You just need the template to add the think tag as in the given prompt template https://huggingface.co/Qwen/QwQ-32B/blob/main/tokenizer_config.json. You'll see the tag at the very end {{- '<|im_start|>assistant\\n<think>\\n' }}\n{%- endif %}\n

I am using sglang to serve this for tensor parallel. how can I apply the template ?

I am using sglang to serve this for tensor parallel. how can I apply the template ?

I guess translating template isn't very hard for LLMs ;) so if you have a template example, just show it the source and your target example, great are the chances it will do it correctly ;)

With ollama, using the suggested template (it does not add the tag as suggested in the README), I personally get the at the beginning when trying short prompts like "How to cook a fried egg to perfection?" 15/15 trials. I guess the unwanted behavior might occur on longer input context?

Ok guys, I think you should forget everything I said as I'm getting confused with the ollama template...
When using their suggested template from https://ollama.com/library/qwq/blobs/41190096a061:

{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ end }}
{{- end }}

The opening tag is output as intended.

But when I try to force it as per the original prompt from tokenizer_config.json:

{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
<think>
{{ end }}
{{- end }}

It isn't! I don't get what's going on.

So if anybody understands the behavior, please share!

And sorry for misleading to everyone.

I got it working properly with sglang , I build the latest build from git source. I added the --reasoning-parser deepseek-r1 and now in openwebui and else where I get the thinking enclosed

In README:

Ensure the model starts with "<think> \n" to prevent generating empty thinking content, which can degrade output quality. If you use apply_chat_template and set add_generation_prompt=True, this is already automatically implemented, but it may cause the response to lack the <think> tag at the beginning. This is normal behavior.

Updated tokenizer_config, model replies still missing <think> \n start tags

I got it working properly with sglang , I build the latest build from git source. I added the --reasoning-parser deepseek-r1 and now in openwebui and else where I get the thinking enclosed

This works also for vLLM. Thank you.

Removing the last part "\n think in '<|im_start|>assistant\n think\n' " in the "chat_template" can solve this problem.

I got it working properly with sglang , I build the latest build from git source. I added the --reasoning-parser deepseek-r1 and now in openwebui and else where I get the thinking enclosed

After adding these flags model accuracy decreased. without adding these flags model can answer "how many r's in word strawberrry?" correctly. but after adding these flags it is unable to answer. it gives 3 R's but correct answer is 4 as there is an extra R. I am using FP16 model.

I got it working properly with sglang , I build the latest build from git source. I added the --reasoning-parser deepseek-r1 and now in openwebui and else where I get the thinking enclosed

After adding these flags model accuracy decreased. without adding these flags model can answer "how many r's in word strawberrry?" correctly. but after adding these flags it is unable to answer. it gives 3 R's but correct answer is 4 as there is an extra R. I am using FP16 model.

Did you set repeat penalty superior to 1? As it seems to decrease quality messing up with the sampling, as per unsloth's findings

I got it working properly with sglang , I build the latest build from git source. I added the --reasoning-parser deepseek-r1 and now in openwebui and else where I get the thinking enclosed

This works also for vLLM. Thank you.

非常感谢 已经好用了

Sign up or log in to comment