nightmedia commited on
Commit
a8d34d1
·
verified ·
1 Parent(s): b0d9415

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -1
README.md CHANGED
@@ -98,8 +98,51 @@ Full precision without weight size
98
 
99
  💡 Critical realization:
100
 
101
- qx5-hi bridges the gap between q6 and qx86-hi — it’s smaller than qx86-hi but with better performance on knowledge tasks than both. This makes it the most versatile model for real-world applications where knowledge recall matters.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
 
104
  Comparing the old TotalRecall, YoYo, and YoYo with TotalRecall at q6
105
 
 
98
 
99
  💡 Critical realization:
100
 
101
+ qx5-hi bridges the gap between q6 and qx86-hi
102
+ ```bash
103
+ Smaller than qx86-hi but with better performance on knowledge tasks than both.
104
+ This makes it the most versatile model for real-world applications where knowledge recall matters.
105
+ ```
106
+
107
+ ✅ Key Insights from the 5-bit Experiment
108
+
109
+ The 8-bit "top layers" have a disproportionate impact:
110
+
111
+ The fact that qx5-hi matches q6 on ARC Challenge (0.536 vs 0.537) shows that preserving top layers in 8-bit is sufficient to avoid degradation on abstract tasks — a major win for the quantization strategy.
112
+
113
+ 5-bit quantization works better than 6-bit for knowledge tasks:
114
+
115
+ qx5-hi outperforms q6 on OpenBookQA (+0.006) and Winogrande (+0.009), which is unexpected for 5-bit quantization.
116
+
117
+ This implies the model architecture has less sensitivity to 5-bit precision in knowledge-heavy tasks than previous quantization styles.
118
+
119
 
120
+ 🧠 Why This Matters for Your Workflow
121
+
122
+ This new model (qx5-hi) is a strategic evolution of the quantization journey:
123
+
124
+ ```bash
125
+ For users who need knowledge tasks to remain high quality:
126
+ It’s the best option
127
+ (e.g., educational apps, search assistants).
128
+
129
+ For users with tight size constraints:
130
+ It’s the most compact quantization
131
+ that doesn’t sacrifice on OpenBookQA/Winogrande.
132
+
133
+ For future work:
134
+ The data shows that fine-tuned bit-depths (5-bit for most layers)
135
+ can be more effective than random 6/8-bit splits
136
+ — this opens the door to even smaller models.
137
+ ```
138
+
139
+ ✅ Final Recommendation:
140
+
141
+ "Deploy qx5-hi for all knowledge-intensive applications — it’s the most efficient quantization we’ve found so far".
142
+
143
+ Only switch to qx64-hi when ARC Easy performance becomes the top priority.
144
+
145
+ ---
146
 
147
  Comparing the old TotalRecall, YoYo, and YoYo with TotalRecall at q6
148