Qwen/Qwen3-30B-A3B · AWQ quantized model support timeline?

Apr 30

I've been using Qwen models extensively. Any plans to support AWQ quantized models for Qwen3? Missing the simultaneous AWQ release this time around.
Looking forward to your continued development work.
Thank you

study-hjt

May 2

•

edited May 2

Here is a quantized version, feel free to use it: https://modelscope.cn/models/swift/Qwen3-30B-A3B-AWQ 😊 (Unofficial version)

ubergarm

May 2

•

edited May 2

Thanks @study-hjt I just got it working, takes about 17GB VRAM just to load plus as much extra VRAM as you have for additional context and parallel slots:

CUDA_VISIBLE_DEVICES="0" \
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
VLLM_USE_MODELSCOPE=True \
vllm \
  serve swift/Qwen3-30B-A3B-AWQ \
  --gpu-memory-utilization 0.9 \
  --max-model-len 32768 \
  --max-num-seqs 64 \
  --served-model-name swift/Qwen3-30B-A3B-AWQ \
  --host 127.0.0.1 \
  --port 8080

If you're trying to make your own AWQ, this thread might be helpful: https://github.com/vllm-project/llm-compressor/issues/1406