Text Generation
Safetensors
English
qwen3
creative
creative writing
fiction writing
plot generation
sub-plot generation
story generation
scene continue
storytelling
fiction story
story
writing
fiction
roleplaying
swearing
extreme swearing
rp
graphic horror
horror
nsfw
Not-For-All-Audiences
finetune
programming
code generation
code
coding
coder
chat
brainstorm
qwen
qwencoder
brainstorm 20x
all uses cases
Jan-V1
science fiction
fantasy
thinking
reasoning
unsloth
conversational
Update README.md
Browse files
README.md
CHANGED
@@ -68,6 +68,8 @@ Not even REMOTELY "SFW" ; a nightmare given electronic form.
|
|
68 |
|
69 |
This is no longer a "Qwen", this is a corruption. This is the upside-down.
|
70 |
|
|
|
|
|
71 |
THREE EXAMPLE generations (including prompt, thinking, and output) at the bottom of the page...
|
72 |
|
73 |
Fine tuned and trained (via unsloth) on the custom built inhouse HORROR dataset, in part generated from the master of horror:
|
@@ -141,6 +143,128 @@ New quants will automatically appear.
|
|
141 |
|
142 |
---
|
143 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
144 |
<H2>Help, Adjustments, Samplers, Parameters and More</H2>
|
145 |
|
146 |
---
|
|
|
68 |
|
69 |
This is no longer a "Qwen", this is a corruption. This is the upside-down.
|
70 |
|
71 |
+
(Benchmarks below)
|
72 |
+
|
73 |
THREE EXAMPLE generations (including prompt, thinking, and output) at the bottom of the page...
|
74 |
|
75 |
Fine tuned and trained (via unsloth) on the custom built inhouse HORROR dataset, in part generated from the master of horror:
|
|
|
143 |
|
144 |
---
|
145 |
|
146 |
+
BENCHMARKS (MLX quants) and model comparsions by @Nightmedia
|
147 |
+
|
148 |
+
https://huggingface.co/nightmedia/
|
149 |
+
|
150 |
+
---
|
151 |
+
|
152 |
+
π Qwen3-Great-Bowels-Of-Horror-FREAKSTORM-6B Quantization Comparison
|
153 |
+
```bash
|
154 |
+
Model ARC Challenge ARC Easy BoolQ HellaSwag OpenBookQA PIQA Winogrande
|
155 |
+
qx86 0.478 0.587 0.724 0.627 0.416 0.738 0.637
|
156 |
+
qx86-hi 0.478 0.587 0.723 0.628 0.414 0.739 0.638
|
157 |
+
qx64 0.464 0.572 0.702 0.622 0.414 0.742 0.631
|
158 |
+
qx64-hi 0.467 0.569 0.702 0.621 0.412 0.743 0.630
|
159 |
+
```
|
160 |
+
π Key takeaway:
|
161 |
+
|
162 |
+
This is a high-performing 6B model with strong consistency across quantizations β especially in logical reasoning (BoolQ) and text generation (HellaSwag).
|
163 |
+
|
164 |
+
π How This Model Stands Out
|
165 |
+
|
166 |
+
Exceptional BoolQ performance (0.724+):
|
167 |
+
- The qx86 variants lead with 0.724 (top score among all 6B models in this dataset).
|
168 |
+
- Why it matters: BoolQ tests logical consistency β a score above 0.72 means this model handles binary reasoning tasks exceptionally well for its size.
|
169 |
+
|
170 |
+
Strong HellaSwag results (0.627+):
|
171 |
+
- Consistent >0.625 across all quantizations β top-tier for text generation in ambiguous contexts.
|
172 |
+
|
173 |
+
Minimal degradation between qx86 and qx86-hi:
|
174 |
+
- The -hi suffix only shifts HellaSwag by +0.001 and Winogrande by +0.008 β much smaller changes than seen in other models.
|
175 |
+
- This suggests less "tuning noise" compared to larger models like the 42B Total-Recall series.
|
176 |
+
|
177 |
+
π‘ Why These Quantization Results Matter for Your Workflow
|
178 |
+
|
179 |
+
β
For 6B model deployments with strict resource limits:
|
180 |
+
- The qx86 variant is ideal: highest scores in ARC Easy (0.587) and OpenBookQA (0.416) β critical for fast, efficient reasoning.
|
181 |
+
- Why? As we previously discussed: qx86 (6-bit base + 8-bit enhancements) delivers the best balance for logical creativity in smaller models.
|
182 |
+
|
183 |
+
β οΈ For tasks requiring absolute precision (e.g., code generation):
|
184 |
+
- Use qx64-hi if you need slightly lower resource usage (0.743 PIQA vs 0.739 in qx86-hi).
|
185 |
+
- Why? The -hi tuning for qx64 focuses more on PIQA stability than creative metrics.
|
186 |
+
|
187 |
+
π Comparison to Other Models in the Dataset
|
188 |
+
```bash
|
189 |
+
Model Best Quantization Why It's Good for You
|
190 |
+
Qwen3-Great-Bowels-Of-Horror-FREAKSTORM (6B) qx86 Best overall for 6B models β strong on both logic and creativity
|
191 |
+
Qwen3-Jan-v1-256k-ctx-6B (Brainstorming) qx8 Higher creative tasks but slightly weaker logic
|
192 |
+
Qwen3-ST-The-Next-Generation (6B) qx86-hi Highest Winogrande but less consistent in BoolQ
|
193 |
+
```
|
194 |
+
The Great Bowels Of Horror model delivers the most balanced performance for its parameter size, with no single quantization variant falling below 0.62 in core metrics.
|
195 |
+
|
196 |
+
π― What You Should Know About Qwen3-Great-Bowels-Of-Horror-FREAKSTORM-6B
|
197 |
+
- This 6B model is built to excel in both logical reasoning and creative text generation β it achieves:
|
198 |
+
- #1 BoolQ performance among 6B models (0.724 with qx86)
|
199 |
+
- Stable results across quantizations (minimal changes between qx64/qx86)
|
200 |
+
- Ideal for startups and resource-constrained teams needing high reasoning accuracy without massive compute costs
|
201 |
+
|
202 |
+
Your recommendation:
|
203 |
+
|
204 |
+
For most use cases, start with Qwen3-Great-Bowels-Of-Horror-FREAKSTORM-6B-qx86 β itβs the most efficient way to get top-tier performance for a 6B model.
|
205 |
+
|
206 |
+
This model is particularly exciting because it shows that smaller models can achieve performance close to larger ones when trained with thoughtful quantization β a testament to Qwen3's continued innovation.
|
207 |
+
|
208 |
+
|
209 |
+
π Cross-Series Performance Comparison (All Models)
|
210 |
+
```bash
|
211 |
+
Benchmark qx86 TNG(best) Difference
|
212 |
+
ARC Challenge 0.478 0.452 +0.126
|
213 |
+
ARC Easy 0.587 0.582 -0.005
|
214 |
+
BoolQ 0.724 0.778 -0.054
|
215 |
+
HellaSwag 0.627 0.650 -0.023
|
216 |
+
OpenBookQA 0.416 0.418 -0.002
|
217 |
+
PIQA 0.738 0.745 -0.007
|
218 |
+
Winogrande 0.637 0.640 -0.003
|
219 |
+
```
|
220 |
+
π‘ Where "best variant" was selected from Qwen3-ST series:
|
221 |
+
|
222 |
+
Qwen3-ST-The-Next-Generation-II v1 (qx64) β it's the most balanced variant across all metrics.
|
223 |
+
|
224 |
+
π Qwen3-Great-Bowels-Of-Horror-FREAKSTORM-6B's Strengths
|
225 |
+
- Higher ARC Challenge (0.478 vs 0.452) β this means it's better at solving complex, multi-step reasoning tasks.
|
226 |
+
- Higher ARC Easy (0.587 vs 0.582) β slightly better at adapting to ambiguous or incomplete instructions.
|
227 |
+
- Stronger HellaSwag performance overall β this model consistently scores above 0.62 in text generation tasks.
|
228 |
+
|
229 |
+
β οΈ Qwen3-ST-The-Next-Generation's Advantages
|
230 |
+
- Dominant BoolQ scores (0.778) β it's significantly better at logical consistency tasks, which suggests specialized training for rigorous reasoning.
|
231 |
+
- Better Winogrande (0.640 vs 0.637) β more accurate at resolving pronoun ambiguity and contextual inference (a sign of refined language understanding).
|
232 |
+
|
233 |
+
π‘ Why This Difference Exists
|
234 |
+
- Qwen3-Great-Bowels-Of-Horror-FREAKSTORM was trained on horror-themed datasets β this explains its slightly higher performance in creative tasks like HellaSwag (0.627 vs 0.640 is small, but statistically meaningful given the context).
|
235 |
+
- Qwen3-ST-The-Next-Generation was likely trained with enhanced logical reasoning tasks β hence its superior BoolQ (0.778 vs 0.724).
|
236 |
+
|
237 |
+
|
238 |
+
π§ What It Means for Your Use Case
|
239 |
+
```bash
|
240 |
+
Use Case Best Model to Choose Why
|
241 |
+
Creative task generation Qwen3-Great-Bowels-Of-Horror-FREAKSTORM Higher HellaSwag (0.627) and more consistent creative output
|
242 |
+
Strict logical tasks Qwen3-ST-The-Next-Generation (qx64) Top BoolQ score (0.778) for binary reasoning tasks
|
243 |
+
General-purpose reasoning Qwen3-Great-Bowels-Of-Horror-FREAKSTORM (qx86) Best balance of ARC Challenge, creativity, and efficiency
|
244 |
+
Low-resource deployment Qwen3-Great-Bowels-Of-Horror-FREAKSTORM (qx86) Smaller size + strong performance for its parameter count
|
245 |
+
```
|
246 |
+
|
247 |
+
π The Critical Takeaway:
|
248 |
+
|
249 |
+
The Great Bowels model is not meant to replace the ST-The-Next-Generation series β it's designed for different strengths.
|
250 |
+
- If you need maximum logical precision, go with ST series (qx64).
|
251 |
+
- If you need strong creative text generation or a comprehensive balance, go with Great Bowels (qx86).
|
252 |
+
|
253 |
+
This comparison shows that both models excel in different areas β the Great Bowels model is especially strong for tasks requiring creative expression and adaptability, while the ST series leads in pure logic and precision.
|
254 |
+
|
255 |
+
β
Final Recommendation
|
256 |
+
- For most production use cases where you need a 6B model with balanced strength:
|
257 |
+
- Choose Qwen3-Great-Bowels-Of-Horror-FREAKSTORM-6B-qx86 β itβs the most effective out of all 6B models in this dataset for real-world applications.
|
258 |
+
- Only select the ST series if your work demands extreme logical precision (e.g., law, engineering) and you can afford a small trade-off in creative tasks.
|
259 |
+
|
260 |
+
This is why model performance comparisons must always consider what you need, not just raw numbers. π
|
261 |
+
|
262 |
+
This model [Qwen3-Great-Bowels-Of-Horror-FREAKSTORM-6B-qx86-hi-mlx](https://huggingface.co/Qwen3-Great-Bowels-Of-Horror-FREAKSTORM-6B-qx86-hi-mlx) was
|
263 |
+
converted to MLX format from [DavidAU/Qwen3-Great-Bowels-Of-Horror-FREAKSTORM-6B](https://huggingface.co/DavidAU/Qwen3-Great-Bowels-Of-Horror-FREAKSTORM-6B)
|
264 |
+
using mlx-lm version **0.27.1**.
|
265 |
+
|
266 |
+
---
|
267 |
+
|
268 |
<H2>Help, Adjustments, Samplers, Parameters and More</H2>
|
269 |
|
270 |
---
|