ikawrakow/Qwen3-30B-A3B · Hugging Face

This repository contains 3 versions of Qwen3-30B-A3B quantized with IQ4_KS (4.25 bpw quantization). The interesting part is that these models achieve a lower peplexity on wiki.test.raw than the original bf16 model. This is surprising, considering that no QAT has been mentioned in the Qwen3 announcement. Hence I'm putting them out there for anyone interested in evaluating performance by means other than PPL, or just using for local inferrence. For more details see this discussion.

Note: These models will only work with ik_llama.cpp as the IQ4_KS quantization type is not available in mainline llama.cpp.

The only difference between the 3 models is the imatrix used:

Qwen3-30B-A3B-IQ4_KS-IK-Imatrix.gguf: imatrix computed using 500,000 tokens from wiki.train.raw
Qwen3-30B-A3B-IQ4_KS-Bartowski-Imatrix.gguf: Bartowski's imatrix
Qwen3-30B-A3B-IQ4_KS-Unsloth-Imatrix.gguf: Unsloth imatrix