Releasing FP8 & F16 Models
First of all, thank you for the open-source models. Qwen is bringing huge growth to the open-source development of LLMs and now image generation.
I hope in the future there will also be FP8 and F18 models launched for lower-end GPUs with only 8β16 GB of VRAM.
It would be great to have multiple models, such as one focused on realism and another on animation, similar to the fine-tuned models of SDXL and SD 1.5.
Since these large models are mostly practical for enterprises but very difficult for personal or retail users, smaller optimized versions would be a big help.
Again, thank you for the superb model.
It can be converted directly through Diffusers, right?
https://huggingface.co/Qwen/Qwen-Image-Edit/discussions/6#68a39afdf4aa9e784e43afc0
It can be converted directly through Diffusers, right?
https://huggingface.co/Qwen/Qwen-Image-Edit/discussions/6#68a39afdf4aa9e784e43afc0
In the process of finding out right now.
Will let you know.
The downloads are killing me, softly.
Why would you use FP16 instead of BF16 though ?
If you GPU doesn't support BF16, I don't think you could even run this
Wait for a FP8 scaled model from Kijai (smart scaling is way better than a naive truncated FP8)
Both the bitsandbytes code and torchao code are now functional.
They can be found here:
bitsandbytes: ~17GB VRAM
https://huggingface.co/Qwen/Qwen-Image-Edit/discussions/6#68a3f2b63a24e2df78974f5d
torchao: ~23GB VRAM
https://huggingface.co/Qwen/Qwen-Image-Edit/discussions/6#68a4013ec45c7fbadef91472
NielsGx: There's a "fast fp16_accumulation" that makes FP16 faster on some (nvidia, as far as I know) cards. Shows up as "fp16_fast" I believe, in some ComfyUI nodes. So that'd be >A< reason.
Found the reference, from the Kijai Wan 2.1 T2V workflow: "fp_16_fast enables 'Full FP16 Accumulation in FP16 GEMMs" feature available in the very latest pytorch nightly, this is around 20% speed boost. '
So that's >A< reason, if you've got vram to burn.