I’ve tested versions 5.0, 5.3, and 7.1, each with over 30 generated images

#92
by okims - opened

I’ve tested versions 5.0, 5.3, and 7.1, each with over 30 generated images using the i2t method.
Here are the version differences I observed:

Reference (i2t): portrait of an East Asian woman, front-facing.

5.0

Slightly less facial consistency compared to the original 2509 version, but still performs well overall.
Occasionally generates faces that look somewhat Western.

5.3

Slightly less facial consistency than 5.0, but allows for a wider range of expression.
Sometimes shows strange grid-like artifacts in the image (tested with the LCM sampler).
Occasionally produces Western-looking faces.

7.1

Consistency drops significantly.
Frequently generates Western-looking faces (though results vary depending on the reference image).
Grid-like artifacts appear more often and are more noticeable than in 5.3.
The main advantage is that it follows prompts well — poses and scenes are generated as intended —
but facial consistency and overall quality are much lower due to the artifacts.

Was kinda hoping you'd show some examples.
Also, wouldn't it make more sense to test on editing, rather than from-scratch generation? Since there are dedicated t2i models or models known to excel at this (including Wan2.2 as an image generator).

Was kinda hoping you'd show some examples.
Also, wouldn't it make more sense to test on editing, rather than from-scratch generation? Since there are dedicated t2i models or models known to excel at this (including Wan2.2 as an image generator).

so far I have used v5.0~5.3, v7.0 for image editing. V5.0 has the best consistency, the other version will change the face slightly (more or less).

I tested it, and it seems that indeed version V5.0 has a better effect on preserving facial features. Could it be that the weight of a certain LoRA integrated into the subsequent versions caused too much interference?

Yes, I have also compared V5.0 with later versions.By adding combinations of LoRA to improve consistency, although V5.0 is not perfect in terms of clarity and prompt-following ability, its consistency is much better than subsequent versions. Moreover, it seems that after V5.3, the concept of NSFW tends to generate images with exposed shoulders. My favorite sampling combination (seed3/beta/4 steps) produces images with darker tones and higher saturation after V5.3, and I speculate this is related to adjustments to the scheduler.
Here is a test example.
Original image:

image

Edited image:

image

hello everything, i just want to gib report 🫡 after two days of waiting Phr0ot upload, reupload, deleted, reupload, long story short = image provided below for reference. v8 is beautiful, and it took me very hard on computer to run 😞 waiting for phil2sat to gguf the v8 quans and maybe my computa will be usain bolt 😍 thank you for phr00t and phil2sat for making mini qwen2509 for public, appreciate so much time & dedication you guys put 👍

image

Curious to hear reports from others with v8!

I tested it, and it seems that indeed version V5.0 has a better effect on preserving facial features. Could it be that the weight of a certain LoRA integrated into the subsequent versions caused too much interference?

It could be weights. When I am making a NSFW merge, I do prioritize support for NSFW material (sexual positions, genitalia etc.), as "preserving faces" is secondary to actually being a NSFW-capable model. The SFW version is far more light on LORAs, so if it is due to NSFW LORAs, the SFW version should do much better. It is tough to have the "best of both worlds". Using prompts that include stuff like "exact facial features" might help.

Sign up or log in to comment