I was almost happy. Almost...
As the title says - I was almost happy. Almost...
I'm using Reka-Flash-3-21B-Reasoning-MAX-NEO-D_AU-Q3_K_S-imat.gguf. I was pretty impressed by the reasoning part in the code fixing task I gave it.
Code fixing task: Fix the pong game code.
In the reasoning part, there were couple of little confusions, but it got back on track and eventually came up with a solid plan for couple of things that must be fixed in the code. To that point I was pretty excited.
Unfortunately, in the final answer, when it had about a half of the code written, it was pretty clear to me that it simply wasn't going to actually apply all of those fixes it planned for... Perhaps the most painful part is it omitted the most important fix of them all - add the player's paddle movement which was completely missing. It's a simple fix and it even wrote a short draft for it in the reasoning part, but then it didn't apply it in the final answer. That's a shame.
Any ideas as to what could be wrong or how could I fix the model outputs to make sure it will actually apply the plan from reasoning into final answer?
Try Q4KM/IQ4 or Q5 (even if you have to offload some onto ram/cpu).
This will address any nuance (instruction following and/or reasoning/output) issues that may be part of the problem.
Likewise; ask the model the following (of your prompt):
Can the following prompt be made more clear? more direct? What are you guessing at in this prompt?
PROMPT
So I tried Q4KS, Q4KM, I believe that's as high as I can go on my current hardware. Unfortunately it felt like going to higher quants gave me even worse results.
This is a part of the output from Q4KM:
For the computer (left?) paddle collision, wait no, the computer is the left paddle? Wait, the code names leftPaddleY and leftPaddleY, but the computer AI is controlling the left paddle? Wait no, in the drawPaddle function, left paddle is drawn at 10, leftPaddleY. The computer's paddle is the left one? But in the computerAI function, it's moving the leftPaddleY? Wait, no, the computer paddle is the left one? Wait, rightPaddleY is for the computer's paddle? Wait, the code has leftPaddleY and leftPaddleY (duplicate?), but maybe that's a typo. Wait, looking at line 11: let leftPaddleY = (canvas.height - paddleHeight) / 2; line 12: let leftPaddleY = (...). Wait no, line 11 is leftPaddleY, line 12 is leftPaddleY again? No, line 12 is leftPaddleY again? Wait line 11: let leftPaddleY = (canvas.height - paddleHeight) / 2; line 12: let leftPaddleY = (...). That's a duplicate line. Probably line 12 is leftPaddleY for the computer? But no, line 12 is leftPaddleY again. Wait, line 11 is leftPaddleY, line 12 is leftPaddleY again? Let me check:
Line 11: let leftPaddleY = (canvas.height - paddleHeight) / 2;
Line 12: let leftPaddleY = (canvas.height - paddleHeight) / 2; ?
Wait no, line 12 is leftPaddleY for the computer? Wait line 12 is leftPaddleY again. That's a mistake. Probably line 12 should be left
The actual lines of code (from the given prompt) the model is referring to are:
let leftPaddleY = (canvas.height - paddleHeight) / 2;
let rightPaddleY = (canvas.height - paddleHeight) / 2;
So yeah, something is definitely wrong there and Q3_K_S definitely felt less confused than Q4_K_M...
Hmm ; you are testing it with Javascript. ; maybe put in the prompt - this is javascript or something similar?
"let", "var", and others - some are shared with other programming languages.
Although it should detect this as javascript.
Try IQ3s and/or IQ4XS - IQ quants are processed differently, and you may get a better result.
That being said, Qwen/QwQ models are specific trained on programming code - so you might get better results IE: Deepseek Distill 14B.
Well, I was avoiding IQs because in general they feel slower and they are also marked as being slower on Vulkan backend which is what I'm using.
In the meantime I tried your Mistral Small that you processed with the same treatment. It's a pretty good model even with standard quants, so I was curious how would your quants perform. I downloaded Q3_K_S, used very low temperature of 0.15 (I believe that's what is recommended for Mistral), Top-K: 0, Repeat Penalty: 1, Top-P: 0.95, Min-P: 0. I got pretty good results with that. I mean it didn't give the best solution for fixing the player's paddle movement, but at least it was a working solution. A big surprise for me was that the model fixed wrong paddle dimensions which most models simply ignore and I think the regular Mistral Small quants ignored it as well. Then again, Mistral is not a thinking model, perhaps that makes difference and somehow the "Max" quant trick doesn't work that well with thinking models?
In any case, having a decent thinking model with reasonable generation speed would be nice, I was kinda hoping to get this Reka working somehow because benchmarks looked promising and it's smaller than QwQ which would make it a good candidate for my hardware. Could you please share your parameters that you use with this Reka model?
RE: Settings:
Temp range .6, Rep pen 1.1 , TopK 40 , topP .95, minP .05
Rep pen range: 64-128 (helps keep reasoning on track / quality of output)
I used these settings to generate all examples, and for general testing.
RE: Reasoning/thinking models.
Temp plays a key role (as does the quant) in terms of reasoning ability.
Roughly Q4s / IQ4s have almost as strong as Q6/Q8.
Changes between Q2k, Q3KS, Q3KM, Q3KL are significant.
It can mean the difference between the model "seeing the issue(s)" or not... and/or realizing it has the right solution.
However, IQ3_M/S are stronger in my opinion, (and it some tests) than Q3KS/Q3KM.
I would start at temp .6 and make MICRO changes -> ie .61., .62 ... .59,.58 etc
Might even want to try temp .1 ... .2 ...
Would temps at +1, +2 work?
Reka is unusually stable at temp for a reasoning model, so this might be the way to go.