Any way to disable reasoning?

by aoleg - opened about 18 hours ago

about 18 hours ago

Basically, it works - thank you for this model! But do you know of a way to disable reasoning? It does not seem to accept common tokens like /nothing or /no_think, and I have no idea on how to access the template via llama.cpp.

ineersa

about 18 hours ago

•

edited about 18 hours ago

Guess you can set thinking budget to 0, but you will still get budget reflect.
I guess need some jinja template modifications to use budget reflect only when budget > 0.

Example response with budget 0:

The current thinking budget is 0, so I will directly start answering the question.</seed:cot_budget_reflect>
</seed:think>Hello! How can I help you today?

yarikdevcom

Owner about 18 hours ago

I think it's impossible to disable as it's primary usage is reasoning

yarikdevcom

Owner about 18 hours ago

Yeah, settings budget to 0 is nice trick

aoleg

about 17 hours ago

•

edited about 17 hours ago

So how do you do it, exactly? Do you edit the jinja template, or just prompt?

aoleg

about 17 hours ago

Got it working in llama-cli by using this format:

llama-cli -m Seed_OSS_36B_Instruct_Q4_K_M.gguf --ctx-size 32768 --n-gpu-layers 99 --temp 1.1 --top-p 0.95 --no-mmap --flash-attn --cache-type-k f16 --cache-type-v f16 --jinja --chat-template-file chat_template.jinja

Downloaded chat_template.jinja from the original model and changed one line:

{%- set thinking_budget = 0 -%}

Works fine in CLI; for some reason doesn't work with llama-server, don't know why. But at least this is something already. Also, the model seems to ignore its own rules, and outputs think_end_token without the think_begin_token, so I guess this is one of those cases where a model ships with a borked template. Hope unsloth or another team can fix it.

aoleg

about 17 hours ago

Actually, there is an even simpler way. The jinja template contains the following system message for thinking_budget == 0 :

You are an intelligent assistant that can answer questions in one step without the need for reasoning and thinking, that is, your thinking budget is 0. Next, please skip the thinking process and directly start answering the user's questions.

So just adding that to the system prompt disables thinking. Maybe there is an even shorter version, I'll experiment with that.

ineersa

about 17 hours ago

Got it working in llama-cli by using this format:

llama-cli -m Seed_OSS_36B_Instruct_Q4_K_M.gguf --ctx-size 32768 --n-gpu-layers 99 --temp 1.1 --top-p 0.95 --no-mmap --flash-attn --cache-type-k f16 --cache-type-v f16 --jinja --chat-template-file chat_template.jinja

Downloaded chat_template.jinja from the original model and changed one line:

{%- set thinking_budget = 0 -%}

Works fine in CLI; for some reason doesn't work with llama-server, don't know why. But at least this is something already. Also, the model seems to ignore its own rules, and outputs think_end_token without the think_begin_token, so I guess this is one of those cases where a model ships with a borked template. Hope unsloth or another team can fix it.

You can add --chat-template-kwargs '{"thinking_budget": 0}'

ilintar

about 14 hours ago

Also, the Jinja template is not baked into the model config, so convert-hf-to-gguf.py doesn't retrieve it. You can bake it in after conversion or just use --chat-template-file with the downloaded official template.

aoleg

about 14 hours ago

The chat template is built into the gguf; works just fine without extrenal template file. --chat-template-kwargs '{"thinking_budget": 0}' is a good solution, but after looking at the chat template, it seems that the only thing it does is just setting that system prompt that I mentioned.

aoleg

about 3 hours ago

And the final update: koboldcpp just pushed an update, including both thinking and non-thinking chat templates: https://imgur.com/a/LTaWq7t
Both templates work great.

aoleg changed discussion status to closed about 3 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment