This repository contains 3 versions of Qwen3-30B-A3B quantized with IQ4_KS
(4.25 bpw quantization). The interesting part is that these models achieve a lower peplexity on
wiki.test.raw
than the original bf16
model. This is surprising, considering that no QAT has been mentioned
in the Qwen3 announcement. Hence I'm putting them out there for anyone interested in evaluating performance by means other than PPL,
or just using for local inferrence.
For more details see this discussion.
Note: These models will only work with ik_llama.cpp as the IQ4_KS
quantization type is not available in mainline llama.cpp
.
The only difference between the 3 models is the imatrix used:
- Downloads last month
- 236
Hardware compatibility
Log In
to view the estimation
4-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support