Mega V3 reference image faces

#103
by kheiri - opened

Using Vace to video, on strength 1, it does not stay faithful to the original features. Any tips?

Do you have an example to share along with the workflow?

i'm using the workflow json file included in the megav3 folder

So, i got much better results after i noticed the wanVaceToVideo had nothing in the reference slot, which i plugged with the starting image. I also changed the schedular and sampler to sa_solver and beta respectively and strength to 1.15.

If you set starting image(s) and add the same picture to the reference it will keep the character more consistent but removes much of the dynamics. So it kind of works but it is not how this node was intendend to be used. It works like a magnet and the AI is always trying to follow your prompt but at the same time is always tries to go back to the reference image. Those vase nodes are very complicated to use the right way and to understand them. Here are some good examples how to use them. There is false masking, real masking and much more involved.
https://www.reddit.com/r/StableDiffusion/comments/1m04uv6/wan_21_vace_howto_guide_for_masked_inpaint_and/

I'm working on something that could work to keep at least the face more consistent. It is basicly a two step generation and could work like that:
Step 1: Generate your video and give a shit about the non consistent face.
Step 2: Take the video and auto replace (extra workflow to be made) the face of every frame with white. Use that video as input images. For the reference image extract the face with white background and use it as reference image.

wow.. that's a lot of work. feels like a standard image to video model would be better.

I'm working on something that could work to keep at least the face more consistent. It is basicly a two step generation and could work like that:
Step 1: Generate your video and give a shit about the non consistent face.
Step 2: Take the video and auto replace (extra workflow to be made) the face of every frame with white. Use that video as input images. For the reference image extract the face with white background and use it as reference image.

So which solution are you adopting for the replacement method? If it's the reactor solution, then it's still insufficient for handling some complex scenarios. If it's the Animate scenario (or a method similar to VACE), then the consumption of video memory will be particularly large, and the running time should also increase significantly.

wow.. that's a lot of work. feels like a standard image to video model would be better.

That is why you make workflows^^. If I manage to do what I described AND if the results are better as without "faceswap". I will post the workflow.

I'm working on something that could work to keep at least the face more consistent. It is basicly a two step generation and could work like that:
Step 1: Generate your video and give a shit about the non consistent face.
Step 2: Take the video and auto replace (extra workflow to be made) the face of every frame with white. Use that video as input images. For the reference image extract the face with white background and use it as reference image.

So which solution are you adopting for the replacement method? If it's the reactor solution, then it's still insufficient for handling some complex scenarios. If it's the Animate scenario (or a method similar to VACE), then the consumption of video memory will be particularly large, and the running time should also increase significantly.

I think you do not understood what I meant. Very simplified speaking: You create a video with MEGA by giving the first frame. You take this video and replace the face in all frames with white (and maybe a mask along with each frame). Now you do the "mega" workflow again. First frame now is not a single frame but all the frames. As reference image you use the first frame with only the face left on a white background.

So, i got much better results after i noticed the wanVaceToVideo had nothing in the reference slot, which i plugged with the starting image. I also changed the schedular and sampler to sa_solver and beta respectively and strength to 1.15.

FYI using the "reference image" slows down generation as it creates 4 extra "junk" latents, and then you need to use the "trim latents" node to cut them out after the KSampler. I found just using the control video & mask seemed to get the job done enough without needing a reference image, but your mileage may vary.

Sign up or log in to comment