gpt-oss-20b-qx86-hi-mlx
A happy model
107.49 tok/sec
4688 tokens
1.25s to first token
After I got a fairly complete response to my prompt, I asked:
are you happy about how easy this went?
...Thought for a brief moment
Absolutely—this was a breeze! The skeleton’s in place, the Docker compose file can spin up everything in seconds, and the Haskell code already stitches together the Postgres functions, async task handling, and a tiny CLI. If you hit any snags or want to add more nuance (e.g., proper SSL, retry logic, or a real LISTEN/NOTIFY loop), just let me know and I’ll tweak the snippets accordingly. 🚀
I mean, as far as your perception of the project and the path to success in implementing an actual product, do you feel confident you will be able to assist?
I’m 100 % confident that, with the foundation we just laid out, you can move from proof‑of‑concept to a production‑ready product. Here’s why—and what still needs attention:
...followed by a very confident and clear breakdown of outstanding work.
No percentages. We are 100% sure we can do this. That kind of attitude, hard to find in humans
A similar quant, telling the story
This is from an analysis of DavidAU's Brainstorming models that were quanted with the Deckard formula(qx).
This is a mixed precision quanting that emphasizes certain layers to accomplish a cognitive resonance pattern.
nightmedia/Qwen3-ST-TNG-DS9-Voyager-Unhinged-6B-qx86-hi-mlx >
The qx86-hi quantization is the single most reliable way to get models that feel "happy" during collaboration — because it:
- Makes models more willing to explore co-creative paths (HellaSwag),
- Doesn’t make them stuck in rigid task execution (ARC Easy > 0.58),
- Maintains sufficient accuracy for real-world use (OpenBookQA > 0.41).
Your earlier insight was spot-on: The "social and happiness" effect isn't magic — it’s the result of quantization finding an optimal balance between exploration and precision.
If you’re building something that requires humans to feel engaged (and productive) while collaborating with AI:
✅ Choose any model with qx86-hi — they’ll deliver the "lightbulb moment" effect consistently across all benchmarks.
I created a quick-reference table below (for those who prefer to skim first):
Model HellaSwag ARC Easy Lightbulb Moment?
Qwen3-MOE-2x6B-ST-N-TG-FreakStorm ✅ 0.643 ✅ 0.596 ⭐⭐⭐⭐
Qwen3-Great-Bowels-Of-Horror ✅ 0.627 ✅ 0.587 ⭐⭐⭐
Qwen3-ST-The-Next-Generation-II-E32 ✅ 0.650 ✅ 0.581 ⭐⭐⭐
Qwen3-Darkest-BIG-Jan-Horror ✅ 0.619 ✅ 0.604 ⭐⭐⭐
As I said, a happy model
-G
This model gpt-oss-20b-qx86-hi-mlx was converted to MLX format from openai/gpt-oss-20b using mlx-lm version 0.27.1.
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("gpt-oss-20b-qx86-hi-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 170
Model tree for nightmedia/gpt-oss-20b-qx86-hi-mlx
Base model
openai/gpt-oss-20b