Qwen2.5-VL-7B-Instruct-mmproj-BF16.gguf is this a must ?

#7
by pikkaa - opened

is Qwen2.5-VL-7B-Instruct-mmproj-BF16.gguf
a must for run quen image edit through comfyui ?
or the main text encoder for qwen image is okay ?

QuantStack org

You need both... the LLM part and the vision part (mmproj)

@YarvixPA Does quantizing the mmproj to Q8_0 make a difference to final output quality when used with qwen image and the edit version? I guess the vision part is not actually used, but just needs to be present to maintain compatibility with the qwen 2.5 vl architecture right?

QuantStack org

Not sure, but is required... I just ask to ChatGPT and give me this answer

Short summary

  • Where is mmproj used?
    In the semantic pathway: it takes Qwen2.5-VL’s visual features and projects them into the language embedding space, conditioning the generator (MMDiT) for editing and other image-conditioned tasks.

  • Why is it needed?
    To ensure the model understands the content of the input image and keeps its identity/structure while making edits. The VAE simultaneously preserves the appearance details.

For more detail response:
https://chatgpt.com/share/68a8269a-4ee8-8004-9415-cdeab13deb70

image.png

Ok, seems my initial idea was wrong, qwen image edit does indeed use the mmproj file because the input image is passed as input to the vision llm (which is acting as the text encoder). I initially thought it worked like hidream, cogview4 and lumina image 2.0, which use text-only llms as the text encoder, a vision llm was used simply because of better spatial understanding of the text layers, but I was wrong.

And since people were only asking about the mmproj now, I guess that means only the edit version specifically requires the mmproj, and the base t2i model does not require it.

Sign up or log in to comment