While I work out a way to meaningfully measure perplexity for such a sparse model, here are some other tests:

Model	HumanEval pass@1	KL-div vs FP16 (wiki2 20k tokens)	Top-1 agreement vs FP16
2.25 bpw	88.41%	0.1416	84.78%
3.00 bpw	89.63%	0.0688	89.44%
4.00 bpw	92.07%	0.0215	94.33%
5.00 bpw	93.29%	0.0094	96.24%
6.00 bpw	92.68%	0.0054	97.45%
FP16	91.46%	-	-

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including turboderp/Qwen3-30B-A3B-exl3