Qwen3-Zero-Coder-Reasoning-0.8B

This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly.

This is a coder/programming model, will full reasoning on the Qwen 3 platform that is insanely fast - hitting over 150 t/s on moderate hardware, and 50 t/s+ on CPU only...

This is a generalist coding model, good for code blocks, brainstorming coding ideas, and generating draft code fast.

With reasoning, it can also handle complex code requests too.

It contains 42 layers (a merge of TWO 0.6B models), and 464 tensors - a very dense model for this size.

Specialized, NEO Imatrix enhanced quants (3 different NEO sets, including Q8, f16, and bf16) are here:

https://huggingface.co/DavidAU/Qwen3-Zero-Coder-Reasoning-0.8B-NEO-EX-GGUF

(this repo will also have more instructions for usage too.)

Special thanks to Team Mradermacher for regular and Imatrix quants here:

https://huggingface.co/mradermacher/Qwen3-Zero-Coder-Reasoning-0.8B-GGUF

https://huggingface.co/mradermacher/Qwen3-Zero-Coder-Reasoning-0.8B-i1-GGUF

SETTINGS:

This model requires:

  • Jinja (embedded) or CHATML template
  • Max context of 40k.
  • Suggest min context of 8k to 16k.

Settings used for testing (suggested):

  • Temp .3 to .7
  • Rep pen 1.05 to 1.1
  • Topp .8 , minp .05
  • Topk 20
  • No system prompt.

Settings used for testing #2 (suggested):

  • Temp .55
  • Rep pen 1.05
  • Topp .95 , minp .05
  • Topk 100
  • No system prompt.

This model will respond well to both detailed instructions and step by step refinement and additions to code.

As this is an instruct model, it will also benefit from a detailed system prompt too.

For simpler coding problems, lower quants will work well; but for complex/multi-step problem solving suggest Q6 or Q8.

With this model, you should use statements to tell it what you want and want to disallow to help keep this model on track.


For more information / other Qwen/Mistral Coders / additional settings see:

[ https://huggingface.co/DavidAU/Qwen2.5-MOE-2x-4x-6x-8x__7B__Power-CODER__19B-30B-42B-53B-gguf ]


Help, Adjustments, Samplers, Parameters and More


CHANGE THE NUMBER OF ACTIVE EXPERTS:

See this document:

https://huggingface.co/DavidAU/How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts

Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:

In "KoboldCpp" or "oobabooga/text-generation-webui" or "Silly Tavern" ;

Set the "Smoothing_factor" to 1.5

: in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"

: in text-generation-webui -> parameters -> lower right.

: In Silly Tavern this is called: "Smoothing"

NOTE: For "text-generation-webui"

-> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)

Source versions (and config files) of my models are here:

https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be

OTHER OPTIONS:

  • Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")

  • If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.

Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers

This a "Class 1" model:

For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

Downloads last month
25
Safetensors
Model size
816M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DavidAU/Qwen3-Zero-Coder-Reasoning-0.8B

Merge model
this model
Quantizations
3 models

Collections including DavidAU/Qwen3-Zero-Coder-Reasoning-0.8B