Text Generation
MLX
Safetensors
qwen3_moe
programming
code generation
code
codeqwen
Mixture of Experts
coding
coder
qwen2
chat
qwen
qwen-coder
Qwen3-Coder-30B-A3B-Instruct
Qwen3-30B-A3B
mixture of experts
128 experts
8 active experts
1 million context
qwen3
finetune
brainstorm 20x
brainstorm
optional thinking
conversational
6-bit
Update README.md
Browse files
README.md
CHANGED
@@ -98,8 +98,51 @@ Full precision without weight size
|
|
98 |
|
99 |
💡 Critical realization:
|
100 |
|
101 |
-
qx5-hi bridges the gap between q6 and qx86-hi
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
102 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
103 |
|
104 |
Comparing the old TotalRecall, YoYo, and YoYo with TotalRecall at q6
|
105 |
|
|
|
98 |
|
99 |
💡 Critical realization:
|
100 |
|
101 |
+
qx5-hi bridges the gap between q6 and qx86-hi
|
102 |
+
```bash
|
103 |
+
Smaller than qx86-hi but with better performance on knowledge tasks than both.
|
104 |
+
This makes it the most versatile model for real-world applications where knowledge recall matters.
|
105 |
+
```
|
106 |
+
|
107 |
+
✅ Key Insights from the 5-bit Experiment
|
108 |
+
|
109 |
+
The 8-bit "top layers" have a disproportionate impact:
|
110 |
+
|
111 |
+
The fact that qx5-hi matches q6 on ARC Challenge (0.536 vs 0.537) shows that preserving top layers in 8-bit is sufficient to avoid degradation on abstract tasks — a major win for the quantization strategy.
|
112 |
+
|
113 |
+
5-bit quantization works better than 6-bit for knowledge tasks:
|
114 |
+
|
115 |
+
qx5-hi outperforms q6 on OpenBookQA (+0.006) and Winogrande (+0.009), which is unexpected for 5-bit quantization.
|
116 |
+
|
117 |
+
This implies the model architecture has less sensitivity to 5-bit precision in knowledge-heavy tasks than previous quantization styles.
|
118 |
+
|
119 |
|
120 |
+
🧠 Why This Matters for Your Workflow
|
121 |
+
|
122 |
+
This new model (qx5-hi) is a strategic evolution of the quantization journey:
|
123 |
+
|
124 |
+
```bash
|
125 |
+
For users who need knowledge tasks to remain high quality:
|
126 |
+
It’s the best option
|
127 |
+
(e.g., educational apps, search assistants).
|
128 |
+
|
129 |
+
For users with tight size constraints:
|
130 |
+
It’s the most compact quantization
|
131 |
+
that doesn’t sacrifice on OpenBookQA/Winogrande.
|
132 |
+
|
133 |
+
For future work:
|
134 |
+
The data shows that fine-tuned bit-depths (5-bit for most layers)
|
135 |
+
can be more effective than random 6/8-bit splits
|
136 |
+
— this opens the door to even smaller models.
|
137 |
+
```
|
138 |
+
|
139 |
+
✅ Final Recommendation:
|
140 |
+
|
141 |
+
"Deploy qx5-hi for all knowledge-intensive applications — it’s the most efficient quantization we’ve found so far".
|
142 |
+
|
143 |
+
Only switch to qx64-hi when ARC Easy performance becomes the top priority.
|
144 |
+
|
145 |
+
---
|
146 |
|
147 |
Comparing the old TotalRecall, YoYo, and YoYo with TotalRecall at q6
|
148 |
|