Potentially still broken?
Hi guys, after using it heavily for the past few days, I noticed this model is consistently less intelligent than the 30B MoE model, even when thinking. This does not align with people's experiences using other quants or other model formats. I only tried the Q4_K_XL, but this may affect other sizes too.
As an example, consider the promptYou have six horses and want to race them to see which is fastest. What is the best way to do this?
The 30B MoE model correctly answers that the fastest way to do it is simply one race, sometimes suggesting more races if you are constrained by the tracks.
However, this 32B model consistently gets this wrong, and keeps over-engineering the solution and hallucinating assumptions.
Is there an issue with quants for this specific model perhaps?
What is your context length set to? We heard people had issues with looping because the context length was set to 2,048. Have you tried increasing your context length?
It happens at any length, but I usually stick around 16k - 20k.
I also confirmed this happens with the Bartowski quants, so it may very well be something higher up, such as the original GGUF upload by Qwen or llama.cpp
.
For reference, I'm using LM Studio (0.3.15 build 11) on Windows, RTX 4090 + RTX 4070.
1-shot
You have six horses and want to race them to see which is fastest. What is the best way to do this?
To determine the fastest of six horses efficiently, follow this strategy:
1. **Race 3 groups of 2 horses** (3 races total):
- Group A: Horse 1 vs. Horse 2 → Winner: A1.
- Group B: Horse 3 vs. Horse 4 → Winner: B1.
- Group C: Horse 5 vs. Horse 6 → Winner: C1.
2. **Race the 3 group winners** (Race 4):
- A1, B1, C1 compete → Winner is the fastest horse overall.
This method requires **4 races** in total. It avoids overloading the track with all six horses at once and ensures the fastest emerges through incremental elimination.
I suppose the model doesn't really understand what we mean by "best".
If best is replaced by "fastest":
You have six horses and want to race them to see which is fastest. What is the best way to do this?
The fastest way to determine the fastest horse among six is to **race all six horses together in a single race**. The horse that finishes first is the fastest.
If races are limited to smaller groups (e.g., 3 horses per race), you’d need at least **two races**:
1. Race 3 horses → identify the fastest.
2. Race the remaining 3 horses → identify the fastest.
3. Race the two winners from the first two races → the overall winner is the fastest.
Would you like to refine the question (e.g., race size limits, track conditions) or explore another topic?
@qenthousiast @Thireus have you guys tried the official upload by the Qwen team and see if it still persits?
I was using Q8 128k UD with YaRN max context size and fp16 cache.
Interestingly, there is no issue when testing via https://huggingface.co/spaces/Qwen/Qwen3-Demo (select Qwen3-32B)
The model consistently reports "single race"! (tested 6 times)
The latest uploaded models appear to have fixed the issue (at least the Q8 128k UD with YaRN max context size and fp16 cache, which is what I use). Now I'm getting the following answer:
You have six horses and want to race them to see which is fastest. What is the best way to do this?
[... thinking for quite a while ...]
To determine the fastest horse among six, the optimal approach is to **race all six horses in a single race**. Here's why:
1. **No Constraints**: If there are no restrictions on the number of horses that can compete in a single race, a direct comparison of all six in one race is the most efficient method. The horse that finishes first is unambiguously the fastest, assuming all races are deterministic and there are no ties.
2. **Minimizing Races**: If races are limited to a smaller number of horses (e.g., 2 or 3 at a time), a tournament-style bracket could be used. For example:
- **2 horses per race**: Race 1 vs 2, 3 vs 4, 5 vs 6 (3 races). Then race the three winners (4th race). If the fastest horse is among the initial winners, it will emerge as the top performer. Total races: **4**.
- **3 horses per race**: Race 1-2-3 and 4-5-6 (2 races). Then race the two winners (3rd race). Total races: **3**.
3. **Key Assumption**: The solution assumes that the race results are reliable and consistent (i.e., the fastest horse always wins a race it joins). If this is not the case, additional races may be needed to confirm the result.
**Conclusion**: Without any constraints, **one race** with all six horses is the best and most efficient method to identify the fastest horse. If constraints apply (e.g., limited track capacity), the number of races increases accordingly, but the question does not specify such limitations.
Great to hear unsure what the issue exactly was
Might be because our calibration dataset is now 3x larger so its more accurate :)
Can you guys download the new quants and see if they solve your issue? Thanks!
Download it yesterday.... For those running Vulcan on IGPUs like RYzen 7940hs, you need to set batch size to 364 or so
latest Qwen3-32B-IQ4_XS.gguf, Q8 Cache, https://huggingface.co/unsloth/Qwen3-32B-GGUF/commit/3574c4a
latest Qwen3-32B-IQ4_XS.gguf, Q8 Cache, https://huggingface.co/unsloth/Qwen3-32B-GGUF/commit/3574c4a
Nice glad to see it's working better now!
It works for this prompt