Qwen/QwQ-32B · When will you fix the model replies missing</think>\n start tags

Mar 6

open-webui can't collapse the thought process, it's too tiring to stare at the thought process

xldistance changed discussion title from When will you fix the model replies missing \<think> \n start tags to When will you fix the model replies missing</think>\n start tags Mar 6

CHNtentes

Mar 6

•

edited Mar 6

I think the team actually want us to manually add a "<think>\n" after whole prompt. Not sure how to implement in open-webui

Krotos

Mar 6

Just don't use open-webui, or wait for an update.

dr-e

Mar 6

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

Allenxy96

Mar 6

•

edited Mar 6

You can just remove the <think> tag in the chat template inside tokenizer_config.json, so the model will generate <think> at the beginning of the output and open webui will parse response correctly.
The drawback is that the model may skip the reasoning section for some questions, but I feel the overall experience is fine.

venkilfc

Mar 6

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

This solved the issue, thanks :)

jeffwadsworth

Mar 6

Think tags come up just fine with llama.cpp. Example usage: .\llama-cli --model QwQ-32B-Q8_0.gguf --temp 0.0 --color --threads 36 --ctx-size 128000

owao

Mar 6

•

edited Mar 6

ollama too, no conf needed, no need to modify any prompt temaplate, it works out of the box. Works perfectly coupled to openwebui, Continue etc.

bingw5

Mar 7

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

Doesn't work for qwq-32b-awq model. I am using vllm 0.7.2-post1

doctor013

Mar 7

I used the filter function in open-webui, and it adds tag in the response when the qwq32 model is checked

dr-e

Mar 7

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

Doesn't work for qwq-32b-awq model. I am using vllm 0.7.2-post1

it works for me on 0.7.3

vllm serve <model_id> --dtype half --quantization awq --enable-reasoning --reasoning-parser deepseek_r1 --max-model-len 32768 (adjust len based on your available memory)

meetzuber

Mar 7

•

edited Mar 7

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

After adding these flags model accuracy decreased. without adding these flags model can answer "how many r's in word strawberrry?" correctly. but after adding these flags it is unable to answer. it gives 3 R's but correct answer is 4 as there is an extra R. I am using FP16 model.

we need some other solution.

owao

Mar 7

https://unsloth.ai/blog/qwq-32b RELEVANT for llama.cpp users

xldistance

Mar 9

I used the filter function in open-webui, and it adds tag in the response when the qwq32 model is checked

Can you give me the code to modify open-webui?

SongXiaoMao

Mar 10

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

After adding these flags model accuracy decreased. without adding these flags model can answer "how many r's in word strawberrry?" correctly. but after adding these flags it is unable to answer. it gives 3 R's but correct answer is 4 as there is an extra R. I am using FP16 model.

we need some other solution.

I didn't find any loss in accuracy

ZadoBest

Mar 11

•

edited Mar 11

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

Doesn't work for qwq-32b-awq model. I am using vllm 0.7.2-post1

我也用vllm推理，实际上这个和vllm无关，修改tokenizer_config.json文件中的chat_template字段，将该字段中的<|im_start|>assistant\n <think>\n 的 <think> \n 去掉然后加上--enable-reasoning --reasoning-parser deepseek_r1就可以。

xldistance

Mar 11

更新到官方最新的tokenizer.json,然后使用以下tokenizer_config.json
"""
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151665": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151666": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151667": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151668": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- '' }}\n {%- endif %}\n {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within XML tags:\n" }}\n {%- for tool in tools %}\n {{- "\n" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- "\n\n\nFor each function call, return a json object with function name and arguments within XML tags:\n\n{\"name\": , \"arguments\": }\n<|im_end|>\n" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}\n {%- elif message.role == "assistant" and not message.tool_calls %}\n {%- set content = message.content %}\n {%- if not loop.last %}\n {%- set content = message.content.split('')[-1].lstrip('\n ') %}\n {%- endif %}\n {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}\n {%- elif message.role == "assistant" %}\n {%- set content = message.content %}\n {%- if not loop.last %}\n {%- set content = message.content.split('')[-1].lstrip('\n') %}\n {%- endif %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\n' + content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\n\n{"name": "' }}\n {{- tool_call.name }}\n {{- '", "arguments": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\n' }}\n {%- endfor %}\n {{- '<|im_end|>\n' }}\n {%- elif message.role == "tool" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\n\n' }}\n {{- message.content }}\n {{- '\n' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}\n {{- '<|im_end|>\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\n' }}\n{%- endif %}\n",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"model_max_length": 131072,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

"""

HannahZhao

Mar 12

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

After adding these flags model accuracy decreased. without adding these flags model can answer "how many r's in word strawberrry?" correctly. but after adding these flags it is unable to answer. it gives 3 R's but correct answer is 4 as there is an extra R. I am using FP16 model.

we need some other solution.

I've experienced similar problems, adding these flags makes the model more stupid, generating mixed Chinese and English outputs. How do these flags work exactly?

xldistance changed discussion status to closed Mar 13