When will you fix the model replies missing</think>\n start tags

#19
by xldistance - opened

open-webui can't collapse the thought process, it's too tiring to stare at the thought process

xldistance changed discussion title from When will you fix the model replies missing \<think> \n start tags to When will you fix the model replies missing</think>\n start tags

I think the team actually want us to manually add a "<think>\n" after whole prompt. Not sure how to implement in open-webui

Just don't use open-webui, or wait for an update.

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

You can just remove the <think> tag in the chat template inside tokenizer_config.json, so the model will generate <think> at the beginning of the output and open webui will parse response correctly.
The drawback is that the model may skip the reasoning section for some questions, but I feel the overall experience is fine.

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

This solved the issue, thanks :)

Think tags come up just fine with llama.cpp. Example usage: .\llama-cli --model QwQ-32B-Q8_0.gguf --temp 0.0 --color --threads 36 --ctx-size 128000

ollama too, no conf needed, no need to modify any prompt temaplate, it works out of the box. Works perfectly coupled to openwebui, Continue etc.

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

Doesn't work for qwq-32b-awq model. I am using vllm 0.7.2-post1

I used the filter function in open-webui, and it adds tag in the response when the qwq32 model is checked

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

Doesn't work for qwq-32b-awq model. I am using vllm 0.7.2-post1

it works for me on 0.7.3

vllm serve <model_id> --dtype half --quantization awq --enable-reasoning --reasoning-parser deepseek_r1 --max-model-len 32768 (adjust len based on your available memory)

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

After adding these flags model accuracy decreased. without adding these flags model can answer "how many r's in word strawberrry?" correctly. but after adding these flags it is unable to answer. it gives 3 R's but correct answer is 4 as there is an extra R. I am using FP16 model.

we need some other solution.

https://unsloth.ai/blog/qwq-32b RELEVANT for llama.cpp users

I used the filter function in open-webui, and it adds tag in the response when the qwq32 model is checked

Can you give me the code to modify open-webui?

Sign up or log in to comment