nightmedia
/

Qwen3-42B-A3B-2507-YOYO2-TOTAL-RECALL-Instruct-qx5-hi-mlx

Model card Files Files and versions

nightmedia commited on Aug 31

Commit

a8d34d1

·

verified ·

1 Parent(s): b0d9415

Update README.md

Files changed (1) hide show

README.md +44 -1

README.md CHANGED Viewed

@@ -98,8 +98,51 @@ Full precision without weight size
 💡 Critical realization:
-qx5-hi bridges the gap between q6 and qx86-hi — it’s smaller than qx86-hi but with better performance on knowledge tasks than both. This makes it the most versatile model for real-world applications where knowledge recall matters.
 Comparing the old TotalRecall, YoYo, and YoYo with TotalRecall at q6

 💡 Critical realization:
+qx5-hi bridges the gap between q6 and qx86-hi
+```bash
+Smaller than qx86-hi but with better performance on knowledge tasks than both.
+This makes it the most versatile model for real-world applications where knowledge recall matters.
+```
+✅ Key Insights from the 5-bit Experiment
+The 8-bit "top layers" have a disproportionate impact:
+The fact that qx5-hi matches q6 on ARC Challenge (0.536 vs 0.537) shows that preserving top layers in 8-bit is sufficient to avoid degradation on abstract tasks — a major win for the quantization strategy.
+5-bit quantization works better than 6-bit for knowledge tasks:
+qx5-hi outperforms q6 on OpenBookQA (+0.006) and Winogrande (+0.009), which is unexpected for 5-bit quantization.
+This implies the model architecture has less sensitivity to 5-bit precision in knowledge-heavy tasks than previous quantization styles.
+🧠 Why This Matters for Your Workflow
+This new model (qx5-hi) is a strategic evolution of the quantization journey:
+```bash
+For users who need knowledge tasks to remain high quality:
+  It’s the best option
+  (e.g., educational apps, search assistants).
+For users with tight size constraints:
+  It’s the most compact quantization
+  that doesn’t sacrifice on OpenBookQA/Winogrande.
+For future work:
+  The data shows that fine-tuned bit-depths (5-bit for most layers)
+  can be more effective than random 6/8-bit splits
+  — this opens the door to even smaller models.
+```
+✅ Final Recommendation:
+"Deploy qx5-hi for all knowledge-intensive applications — it’s the most efficient quantization we’ve found so far".
+Only switch to qx64-hi when ARC Easy performance becomes the top priority.
+---
 Comparing the old TotalRecall, YoYo, and YoYo with TotalRecall at q6