The Mega AiO pack changes the I2V result

#86
by Phantex - opened

I never had this issue with version 9 or 10, but the Mega AIO changes the look of the original input image. Why is that?

If you use it like the example workflow in the "mega" folder, that shouldn't happen. If you use it with "reference image" or trying to use the old I2V workflow, the starting image won't work right.

yeah output changes reference image completely

If you use it like the example workflow in the "mega" folder, that shouldn't happen. If you use it with "reference image" or trying to use the old I2V workflow, the starting image won't work right.

I used the new workflow that's in the mega folder. I put the VACE first-last image and last image upload nodes on bypass since I didn't need them. I uploaded my own start image. The produced video result was altered from the start image, like a completely different art style (I used an anime image). Never had this issue on version 10 or earlier.

Phantex, I have a lot of issues with mega as well. I do realistic images and the start frame is off (thus the whole video is off). The character looked more generic and smoothed out. The character morphed including body proportions. Then the worst is the emotions become dead pan looking straight onto something for each one. (Pre-VACE, emotions had incredible body language beyond the face expression.) Even if I bumped CFG, nothing helped. I am going to blame VACE partially. I guess without a control net, it has bad defaults.

I spent at least 12 hours on mega-v1 and I'll probably stop there. v10 is amazing and I'm back to using it. I don't want to discourage the author, but to new users, don't skip out on v10.

in that 12 hours did you read the note about changing the WanVaceToVideo strength to 1 instead of 0?

If you use it like the example workflow in the "mega" folder, that shouldn't happen. If you use it with "reference image" or trying to use the old I2V workflow, the starting image won't work right.

I used the new workflow that's in the mega folder. I put the VACE first-last image and last image upload nodes on bypass since I didn't need them. I uploaded my own start image. The produced video result was altered from the start image, like a completely different art style (I used an anime image). Never had this issue on version 10 or earlier.

If you want to do "image to video", only bypass the last frame and keep the "VaceFirstLastFrame" node. Using the "First frame" is how you do "image to video".

Phantex, I have a lot of issues with mega as well. I do realistic images and the start frame is off (thus the whole video is off). The character looked more generic and smoothed out. The character morphed including body proportions. Then the worst is the emotions become dead pan looking straight onto something for each one. (Pre-VACE, emotions had incredible body language beyond the face expression.) Even if I bumped CFG, nothing helped. I am going to blame VACE partially. I guess without a control net, it has bad defaults.

I spent at least 12 hours on mega-v1 and I'll probably stop there. v10 is amazing and I'm back to using it. I don't want to discourage the author, but to new users, don't skip out on v10.

If your starting frame is off, then you are likely not using the workflow correctly. As I said to Phantex, to do "start frame" videos, you only bypass the last frame (so the "first frame" is your starting frame). You keep "VACE FirstToLast Frame". You keep the strength of 1 on the WanVaceToVideo. Basically, "I2V" is just supplying a "First Frame" in a VACE "first frame to last frame" workflow.

I updated the model card to better describe how to use the different modes of the workflow.

I updated the model card to better describe how to use the different modes of the workflow.

Great author with after-sales service. I'm using the code you provided for free. Thank you very much.
Also, could you please take a look at the questions I posted? Thank you.

I updated the model card to better describe how to use the different modes of the workflow.

I just retried and ONLY bypassed the only the "End frame" node. The result is closer to the starting image than before, but still different compared to version 10 and prior. I tried different seeds.

Here is an example: Starting image: https://postimg.cc/QFbjjtWG

With the Mega pack (SFW model): https://streamable.com/kty07w

With version 10 (SFW model): https://streamable.com/j0r5gn

I used the same resolution in both 768x432. Same frames, 144. Same rate, 24 fps. Same prompt. Everything is the same except the model being used.

Clearly the mega pack changes the initial design of the image compared to version 10 and earlier. Look at the faces.

I updated the model card to better describe how to use the different modes of the workflow.

I just retried and ONLY bypassed the only the "End frame" node. The result is closer to the starting image than before, but still different compared to version 10 and prior. I tried different seeds.

Here is an example: Starting image: https://postimg.cc/QFbjjtWG

With the Mega pack (SFW model): https://streamable.com/kty07w

With version 10 (SFW model): https://streamable.com/j0r5gn

I used the same resolution in both 768x432. Same frames, 144. Same rate, 24 fps. Same prompt. Everything is the same except the model being used.

Clearly the mega pack changes the initial design of the image compared to version 10 and earlier. Look at the faces.

Ahhh, I see what you are saying. It is using the exact same starting frame, as the first frame does look exactly like your starting image. However, the video does quickly change the face in future frames. v10 actually uses the I2V models, but I have to mix them in a not-ideal way that generates noise. You might be able to improve the face consistency by also setting the "reference image" to your starting image (then you will need to use "TrimVideoLatents" after the KSampler output because doing so generates 4 extra "junk" latents). Finally, you might get some improvement by tweaking your prompt... it looks like her face ends up looking more realistic... perhaps prompting more anime styles might keep it better?

Yes, with the VACE method, we do get 4 initial junk frames (it's a shame to lose any but this seems to be a thing with VACE), but then the 5th frame does not resolve to the image supplied. (I've always had strength of 1.) It's doing some i2i generation. Not only does that modify the image, but it also makes it extra hard to stitch together video sequences if last frame will not match the 1st-5th frame of the next segment. With the real i2v models, I can take the last frame and continue the video. There are big challenges with that because there is a color shift and the frames get more blurry. I attempt color correction and it helps. I push it to only 4 segments as a 5th segment just gets too degraded.

v10 i2v is king, but the v10 t2v is also so fun. I do think for people who want to use VACE for motion control, the mega is probably awesome.

I updated the model card to better describe how to use the different modes of the workflow.

I just retried and ONLY bypassed the "End frame" node. The result is closer to the starting image than before, but still different compared to version 10 and prior. I tried different seeds.

Here is an example: Starting image: https://postimg.cc/QFbjjtWG

With the Mega pack (SFW model): https://streamable.com/kty07w

With version 10 (SFW model): https://streamable.com/j0r5gn

I used the same resolution in both 768x432. Same frames, 144. Same rate, 24 fps. Same prompt. Everything is the same except the model being used.

Clearly the mega pack changes the initial design of the image compared to version 10 and earlier. Look at the faces.

Ahhh, I see what you are saying. It is using the exact same starting frame, as the first frame does look exactly like your starting image. However, the video does quickly change the face in future frames. v10 actually uses the I2V models, but I have to mix them in a not-ideal way that generates noise. You might be able to improve the face consistency by also setting the "reference image" to your starting image (then you will need to use "TrimVideoLatents" after the KSampler output because doing so generates 4 extra "junk" latents). Finally, you might get some improvement by tweaking your prompt... it looks like her face ends up looking more realistic... perhaps prompting more anime styles might keep it better?

I think I'll use V10 for now and hope that future mega packs are better for i2v. But I did want you to be aware of this. Thanks for all of your hard work.

Sign up or log in to comment