Is AWQ quantization possible for this model?

#17

by VivekMalipatel23 - opened 5 days ago

5 days ago

I am planning to run this on two 3090s with pipeline parallelism, but looks like 3090 doesn't support FP8. Can we get a AWQ quantized version of this model and other newer Qwen variants?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment