Load main text encoder and mmproj
I've downloaded the model from here, and everything else from the Qwen page. Works fine for the first generation, but crashes trying to load the model on the second generation. I assume I have to pair a smaller text encoder with the 9GB model to avoid goin over my 12GB capacity, right?
It might help noobs like me to mention that on the first page. I was following a video that only swapped the model, and skipped right to the table on the right, thinking the table on the left is just technical jargon, since there wasn't any text above it explaining what it is.
Also an image with workflow is always great help if you care to include one. I can just drag that into comfyui and see what's needed for these specific models.
If you want the GGUF… yes. I mean:
- The GGUF files for the UNET are in this repo
- The text encoder (mmproj) I place it here so you already have it with the correct name. (You can download GGUF and the mmproj from Unsloth repo too, but need to rename it. The FP8 version in Comfy-Org repo).
- The vae is here just to made easy the download, like other repo where people ask for vae. (You can download it from Comfy-Org repo too)
You can just use the gguf for unet, for the main text encoder use the fp8 version (if you dont want to use GGUF in text encoder) and the vae
Ah, I see.
I've downloaded the file in the mmproj and put it in the same folder as the original text encoder. Now I wonder how to use it.
It didn't show up in the clip list, or unet loader. Do I need a specific node for it?
Yes, In the model card in this repo I mention where to place both text encoders and also what custom node you need to get the loaders nodes and be able to load the GGUF loaders
Ah so that's what the first column mean.
Another silly question: Do I need to install a custom node?
I've gone through my clip and gguf related nodes, and the only one that can load the model was CLIPLoader (GGUF)
. But I get this error mat1 and mat2 shapes cannot be multiplied (77x768 and 3072x768)
from TextEncodeQwenImageEdit
node.
Couldn't find anything in the custom nodes and web search either. Even downloaded a couple of other workflows to no avail.
can you share a screenshoot of what nodes are you using?
btw: The node you should install is ComfyUI-GGUF, if you have it just make sure to have ComfyUI (for the support of QwenImageEdit) and ComfyUI-GGUF (for the support of the mmproj) updated.
This are the loaders you should have in the workflow, you can use the official ComfyUI workflow (you can find it in ComfyUI templates tab) just replace the loaders nodes. The mmproj you need to place it in the same folder where you have the main text encoder... the node will detect and load the mmrpoj (just select the main text encoder in the node, no the mmproj).
Sure, I'm using the workflow from Qwen tutorial page, with the model loaders changes:
and here's the json file: https://filebin.net/qwwj07z760o0foic
Okey I see whats the problem...
- For Qwen Image Edit in the text encoder part you need 2 files:
- Main Text Encoder (Qwen2.5-VL-7B... the quant you want)
- The mmproj: You have it but with the wrong name... the name should be
Qwen2.5-VL-7B-Instruct-mmproj-BF16
(like we have available to download in this same repo.)
- The file you should select in the clip loader should be the main text encoder and no the mmproj, the node is going to auto-detect and load the mmproj part
Like in the model card I explain:
Type | Name | Location | Download |
---|---|---|---|
Main Model | Qwen-Image | ComfyUI/models/unet |
GGUF (this repo) |
Main Text Encoder | Qwen2.5-VL-7B | ComfyUI/models/text_encoders |
Safetensors / GGUF |
Text_Encoder (mmproj) | Qwen2.5-VL-7B-Instruct-mmproj-BF16 | ComfyUI/models/text_encoders (same folder as your main text encoder) |
GGUF (this repo) |
VAE | Qwen-Image VAE | ComfyUI/models/vae |
Safetensors (this repo) |
We include in this same repository the vae (to find it easily) and the mmproj (so you don't have to rename)
I though the name might be the case.
Although I've downloaded the file from this repo, it came with a different name.
I renamed it, but still get the same error
When I select the original encoder in the gguf loader, I get this error
Mixing scaled FP8 with GGUF is not supported! Use regular CLIP loader or switch model(s)
(M:\ai\Qwen\models\text_encoders\qwen_2.5_vl_7b_fp8_scaled.safetensors)
Another error is that you didn't download the GGUF version of the main text encoder
That's why I explained at the beginning: if you're going to use gguf, you use the gguf loader... if you're going to use safetensors, you shouldn't use the gguf loader (the node name is the same but without gguf).
If you use the gguf you still missing to download the main text encoder (you have the mmproj, but not the main text encoder).
You should read the model card, where I include the information and try to simplify everything. GGUF is not the same as safetensors. GGUF is a way to use models on computers with low resources or ones that you don't want to consume so many of your PC's resources.
Check this video (not me) but there i see he explain and there you can listen and see to understand better:
https://youtu.be/hYlTteRXX4o?si=ULIMYQZQFMNTUl7s&t=176
Ah, I see.
The inclusion of the safetensor in the table thrown me for a loop. Now I get it.
Should I match the Quant of the text encoder to my unet loader?
Yes, like I explain in the model card in this repo... there you have the hyperlinks where you can download the gguf text encoder, but if you want to use safetensors you also have the hyperlink for safetensors (if you use safetensors, you don’t need the mmproj gguf):
Type | Name | Location | Download |
---|---|---|---|
Main Model | Qwen-Image | ComfyUI/models/unet |
GGUF (this repo) |
Main Text Encoder | Qwen2.5-VL-7B | ComfyUI/models/text_encoders |
Safetensors / GGUF |
Text_Encoder (mmproj) | Qwen2.5-VL-7B-Instruct-mmproj-BF16 | ComfyUI/models/text_encoders (same folder as your main text encoder) |
GGUF (this repo) |
VAE | Qwen-Image VAE | ComfyUI/models/vae |
Safetensors (this repo) |
Last question (hopefully).
Are all of these models loaded into VRAM together?
With the 9GB unet model my 12G vram is maxed. Would switching to quantized text encoder be any help? Or should I keep using the safetensor on CPU?
From what I know yes it use vram and ram... but I always try to choose a combination that fill in my gpu