High_noise_model and low_noise_model are distilled simultaneously. On 81x720x1280, 81x1280x720, 81x480x832, 81x832x480 size options.

#13

by yuduan - opened 7 days ago

Discussion

yuduan

7 days ago

Does this mean that the two models are trained together, rather than being trained separately for each one?

yuduan

7 days ago

Wan2.2 is 24 fps, so there should be 121 frames. Is it okay to train with only 81 frames?

Mu5hr00moO

7 days ago

•

edited 7 days ago

These are for 14B model so 81 frames and 16fps, 5B it2v is 121/24

Edit: checking if edit function still works...

X-niper

7 days ago

@yuduan
Q1:
Yes, the two models are trained together in the distillation process.
Q2:
24fps is for 5B model, 81 frame 16 fps is the best setting for the two A14B model.

yuduan

7 days ago

Thank you for your reply. I was thinking that if two models are trained together, along with a generator, a real net, and a fake net, the GPU memory usage would be enormous. Additionally, what is the timestamp boundary between low and high during training? Will the training code be made open-source?

X-niper

7 days ago

•

edited 7 days ago

@yuduan
The boudary is the same as the original model, i.e., 0.875 / 875 for T2V, 0.9 / 900 for I2V.

As for the memory, you are correct. There are 6 14B models on GPU, and 4 of them are trainable.

We have worked a lot to optimize the GPU memory so that 81x720p videos can be trained on this setting.

Unfortunately, I can't share the training code due to company restrictions. You may try other open-source training codes such as https://github.com/modelscope/DiffSynth-Studio and https://github.com/GoatWu/Self-Forcing-Plus

yuduan

7 days ago

Thank you very, very much.

yuduan

7 days ago

Regarding the denoising_step_list for low and high: I noticed that in Self-Forcing Plus, the denoising_step_list is [1000, 757, 522, 225]. For Wan2.2, should there be 8 values? Could you share the correct list?

X-niper

7 days ago

•

edited 7 days ago

@yuduan
Hi, as said in https://github.com/ModelTC/Wan2.2-Lightning/issues/3, we use [1000.0000, 937.5001, 833.3333, 625.0000], with [1000.0000, 937.5001] for high_noise_model and [833.3333, 625.0000] for low_noise_model.

Please refer to this repo https://github.com/ModelTC/Wan2.2-Lightning/ to see the scheduler we adopt.

The reason we write a new EulerScheduler: I think it's better that the timestep list for 4 steps should be a subset of the timestep list of 8 steps. The scheduler in https://github.com/ModelTC/Wan2.2-Lightning/ has this property while the EulerScheduler in diffusers does not.

natalie5

7 days ago

•

edited 7 days ago

@yuduan
The boudary is the same as the original model, i.e., 0.875 / 875 for T2V, 0.9 / 900 for I2V.

As for the memory, you are correct. There are 6 14B models on GPU, and 4 of them are trainable.

We have worked a lot to optimize the GPU memory so that 81x720p videos can be trained on this setting.

Unfortunately, I can't share the training code due to company restrictions. You may try other open-source training codes such as https://github.com/modelscope/DiffSynth-Studio and https://github.com/GoatWu/Self-Forcing-Plus

For the Wan2.2 high noise 14B T2V model the timestep range for training is "minimum = 0.875 and maximum = 1" and for the low noise T2V model it's "minimum = 0 and maximum = 0.875", I think you guys might have trained with wrong configuration, and that's why people are getting bad results with your lora, for me the old lora is working better than this lora somehow.

yuduan

6 days ago

Thank you, I mainly want to train I2V.

yuduan

6 days ago

I looked at the loss function in Self-Forcing Plus. Could you tell me which function implements the improvements added on top of DMD2?Is it the backward simulation in the generator_loss function? In DMD2, gradients are backpropagated at all five timesteps, while in Self-Forcing, gradients are only computed at the last timestep. Is that correct?

X-niper

5 days ago

•

edited 5 days ago

@yuduan It's a trick to save GPU memory. Please refer to the self-forcing paper to understand this trick.

Besides, we just release the I2V A14B 4steps lora, please take a try :)

Hope it helps.

makchain

5 days ago

•

edited 5 days ago

The 2.2 lightning loras all versions, can't do dim lighting, it can't do dark scenes at all. Doesn't matter the prompt, you always get different variations of full bright lighting.

Sikaworld1990

5 days ago

Pls also focus on gguf optimation, especially multigpu as Kijas workflow is a pain for a 4070 12 GB. With multigpu I generate a 5 sec vid using the Q8 checkpoints in 300 secs, 10 steps 5/5, high lighting 0.6, low 0.95, Res 480x832. Results so far are very good but I wanted to test a comparison with kijajs parameters

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment