The built-in Jinjia template doesn't work in LM Studio. Need to click on the cog icon of the model,
click on the "Prompt" tab, then in the "Prompt Template" section, on the right side under the recylebin icon,
choose "ChatML" as the template. Then it will work.
This model really takes a long time to think and response. I have a 3090ti and the Q4_K model will generate tokens at 3.43 per second.

DavidAU

Owner Mar 9

Update to build 03.12.1 in LMS , it was fixed on the weekend.

RE: Tokens/Thinking.

You can use lower quants - even q2k - and reasoning / thinking will function.
That being said there is a leap from q2k to q3ks.

RE: Thinking.
With your prompt add a little more detail / direction and/or narrow the scope of the "prompt" the "reasoning" is trying to solve.
This will cut down thinking, and focus the model better.
You can also direct how you want it to think, how much thinking to do etc etc.

abiteddie

24 days ago

for me the thinking part is fine but the output is never what i am asking for and somethimes even with artefacts... idk whats the problem

DavidAU

Owner 23 days ago

@abiteddie

What quants are you using?
Sometimes an Imatrix quant (for reasoning models) works better ; they are here:

https://huggingface.co/models?other=base_model:quantized:DavidAU/Qwen2.5-QwQ-35B-Eureka-Cubed-abliterated-uncensored

Suggest IQ3_S/IQ3_M min if you can run it.

NOTE:
The uncensored version may not be as strong as the regular version:

https://huggingface.co/DavidAU/Qwen2.5-QwQ-37B-Eureka-Triple-Cubed-GGUF

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment