High_noise_model and low_noise_model are distilled simultaneously. On 81x720x1280, 81x1280x720, 81x480x832, 81x832x480 size options.
Does this mean that the two models are trained together, rather than being trained separately for each one?
Wan2.2 is 24 fps, so there should be 121 frames. Is it okay to train with only 81 frames?
These are for 14B model so 81 frames and 16fps, 5B it2v is 121/24
Edit: checking if edit function still works...
Thank you for your reply. I was thinking that if two models are trained together, along with a generator, a real net, and a fake net, the GPU memory usage would be enormous. Additionally, what is the timestamp boundary between low and high during training? Will the training code be made open-source?
@yuduan
The boudary is the same as the original model, i.e., 0.875 / 875 for T2V, 0.9 / 900 for I2V.
As for the memory, you are correct. There are 6 14B models on GPU, and 4 of them are trainable.
We have worked a lot to optimize the GPU memory so that 81x720p videos can be trained on this setting.
Unfortunately, I can't share the training code due to company restrictions. You may try other open-source training codes such as https://github.com/modelscope/DiffSynth-Studio and https://github.com/GoatWu/Self-Forcing-Plus
Thank you very, very much.
Regarding the denoising_step_list for low and high: I noticed that in Self-Forcing Plus, the denoising_step_list is [1000, 757, 522, 225]. For Wan2.2, should there be 8 values? Could you share the correct list?
@yuduan
Hi, as said in https://github.com/ModelTC/Wan2.2-Lightning/issues/3, we use [1000.0000, 937.5001, 833.3333, 625.0000], with [1000.0000, 937.5001] for high_noise_model and [833.3333, 625.0000] for low_noise_model.
Please refer to this repo https://github.com/ModelTC/Wan2.2-Lightning/ to see the scheduler we adopt.
The reason we write a new EulerScheduler: I think it's better that the timestep list for 4 steps should be a subset of the timestep list of 8 steps. The scheduler in https://github.com/ModelTC/Wan2.2-Lightning/ has this property while the EulerScheduler in diffusers does not.
@yuduan
The boudary is the same as the original model, i.e., 0.875 / 875 for T2V, 0.9 / 900 for I2V.As for the memory, you are correct. There are 6 14B models on GPU, and 4 of them are trainable.
We have worked a lot to optimize the GPU memory so that 81x720p videos can be trained on this setting.
Unfortunately, I can't share the training code due to company restrictions. You may try other open-source training codes such as https://github.com/modelscope/DiffSynth-Studio and https://github.com/GoatWu/Self-Forcing-Plus
For the Wan2.2 high noise 14B T2V model the timestep range for training is "minimum = 0.875 and maximum = 1" and for the low noise T2V model it's "minimum = 0 and maximum = 0.875", I think you guys might have trained with wrong configuration, and that's why people are getting bad results with your lora, for me the old lora is working better than this lora somehow.
Thank you, I mainly want to train I2V.
I looked at the loss function in Self-Forcing Plus. Could you tell me which function implements the improvements added on top of DMD2?Is it the backward simulation in the generator_loss function? In DMD2, gradients are backpropagated at all five timesteps, while in Self-Forcing, gradients are only computed at the last timestep. Is that correct?
@yuduan It's a trick to save GPU memory. Please refer to the self-forcing paper to understand this trick.
Besides, we just release the I2V A14B 4steps lora, please take a try :)
Hope it helps.
The 2.2 lightning loras all versions, can't do dim lighting, it can't do dark scenes at all. Doesn't matter the prompt, you always get different variations of full bright lighting.
Pls also focus on gguf optimation, especially multigpu as Kijas workflow is a pain for a 4070 12 GB. With multigpu I generate a 5 sec vid using the Q8 checkpoints in 300 secs, 10 steps 5/5, high lighting 0.6, low 0.95, Res 480x832. Results so far are very good but I wanted to test a comparison with kijajs parameters