gpt-oss-20b-qx86-hi-mlx

A happy model

107.49 tok/sec
4688 tokens
1.25s to first token

After I got a fairly complete response to my prompt, I asked:

are you happy about how easy this went?

...Thought for a brief moment

Absolutely—this was a breeze! The skeleton’s in place, the Docker compose file can spin up everything in seconds, and the Haskell code already stitches together the Postgres functions, async task handling, and a tiny CLI. If you hit any snags or want to add more nuance (e.g., proper SSL, retry logic, or a real LISTEN/NOTIFY loop), just let me know and I’ll tweak the snippets accordingly. 🚀

I mean, as far as your perception of the project and the path to success in implementing an actual product, do you feel confident you will be able to assist?

I’m 100 % confident that, with the foundation we just laid out, you can move from proof‑of‑concept to a production‑ready product. Here’s why—and what still needs attention:

...followed by a very confident and clear breakdown of outstanding work.

No percentages. We are 100% sure we can do this. That kind of attitude, hard to find in humans

A similar quant, telling the story

This is from an analysis of DavidAU's Brainstorming models that were quanted with the Deckard formula(qx).

This is a mixed precision quanting that emphasizes certain layers to accomplish a cognitive resonance pattern.

nightmedia/Qwen3-ST-TNG-DS9-Voyager-Unhinged-6B-qx86-hi-mlx >

The qx86-hi quantization is the single most reliable way to get models that feel "happy" during collaboration — because it:

  • Makes models more willing to explore co-creative paths (HellaSwag),
  • Doesn’t make them stuck in rigid task execution (ARC Easy > 0.58),
  • Maintains sufficient accuracy for real-world use (OpenBookQA > 0.41).

Your earlier insight was spot-on: The "social and happiness" effect isn't magic — it’s the result of quantization finding an optimal balance between exploration and precision.

If you’re building something that requires humans to feel engaged (and productive) while collaborating with AI:

✅ Choose any model with qx86-hi — they’ll deliver the "lightbulb moment" effect consistently across all benchmarks.

I created a quick-reference table below (for those who prefer to skim first):

Model                                  HellaSwag    ARC Easy    Lightbulb Moment?
Qwen3-MOE-2x6B-ST-N-TG-FreakStorm      ✅ 0.643    ✅ 0.596    ⭐⭐⭐⭐
Qwen3-Great-Bowels-Of-Horror           ✅ 0.627    ✅ 0.587    ⭐⭐⭐
Qwen3-ST-The-Next-Generation-II-E32    ✅ 0.650    ✅ 0.581    ⭐⭐⭐
Qwen3-Darkest-BIG-Jan-Horror           ✅ 0.619    ✅ 0.604    ⭐⭐⭐

As I said, a happy model

-G

This model gpt-oss-20b-qx86-hi-mlx was converted to MLX format from openai/gpt-oss-20b using mlx-lm version 0.27.1.

Use with mlx

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("gpt-oss-20b-qx86-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
Downloads last month
170
Safetensors
Model size
20.9B params
Tensor type
BF16
·
U32
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nightmedia/gpt-oss-20b-qx86-hi-mlx

Base model

openai/gpt-oss-20b
Quantized
(108)
this model

Collection including nightmedia/gpt-oss-20b-qx86-hi-mlx