This repository contains 3 versions of Qwen3-30B-A3B quantized with IQ4_KS (4.25 bpw quantization). The interesting part is that these models achieve a lower peplexity on wiki.test.raw than the original bf16 model. This is surprising, considering that no QAT has been mentioned in the Qwen3 announcement. Hence I'm putting them out there for anyone interested in evaluating performance by means other than PPL, or just using for local inferrence. For more details see this discussion.

Note: These models will only work with ik_llama.cpp as the IQ4_KS quantization type is not available in mainline llama.cpp.

The only difference between the 3 models is the imatrix used:

  • Qwen3-30B-A3B-IQ4_KS-IK-Imatrix.gguf: imatrix computed using 500,000 tokens from wiki.train.raw
  • Qwen3-30B-A3B-IQ4_KS-Bartowski-Imatrix.gguf: Bartowski's imatrix
  • Qwen3-30B-A3B-IQ4_KS-Unsloth-Imatrix.gguf: Unsloth imatrix
Downloads last month
236
GGUF
Model size
30.5B params
Architecture
qwen3moe
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support