Man, the RTX 20-series graphics cards don't support it. My RTX 2080 Ti with 22GB VRAM takes 10 minutes to generate a single image.
The Qwen-image model is based on BF16, but since the RTX 20-series graphics cards only support FP16, it ends up running in FP32 precision for image generation. Waiting for over ten minutes for a single image to render is really exhausting. I'm wondering if there's any possibility of technical optimization for this issue. Could you please take some time to help with this? I truly need to use this model.
This model is known to have very large activations, larger than fp16 could ever handle, so only bf16 precision is really viable. As a 2080Ti doesn't have native bf16, you're basically stuck with these long generation times.
Edit: this is basically this answer but rehashed, as you already asked the same question in that thread.
There seems to be a converted fp16 qwen image on civit. Idk if quanting that instead would help?
Taking me around an hour using a 5070 ti so I'm jealous of your 10 minutes.
I have it working on an RTX-2080-super 8GB VRAM, generating in 46.5 seconds.
Using a 4-step lightening workflow and the smallest GGUF models. But still looks fine, and I'm still experimenting with progressively increasing the GGUF sizes to find out what else can work.
The workflow I used (I have no affiliation with this guy) was as follows:
patreon (free download, no need to support him or be a patreon member), The Local Lab AI, August 18, Free Qwen-Image 4 Step Text to Image and Image to Image - ComfyUI Workflow & Guide
The GGUF models he links to OOM the 2080-super. But it works using using the following smaller GGUF files:
Qwen2.5-VL-7B-Instruct-GGUF
replaced with
Qwen2.5-VL-7B-Instruct-Q2_K.gguf
qwen-image-Q3_K_S.gguf
replaced with
qwen-image-Q2_K.gguf
Not suggesting it's optimum, but it's a working starting point for 20xx GPU-deficient users like me.