Wan-I2V-LoRA-Kiss / README.md
ighoshsubho's picture
Update README.md
9cb017c verified
metadata
tags:
  - image-to-video
  - lora
  - musubi-tuner
  - template:diffusion-lora
widget:
  - text: >-
      A woman in a shimmering red dress and dangling earrings stands beside a
      man in a dark blue shirt with a bear logo. She glances at him, their eyes
      lock, and they share a tender kiss.
    output:
      url: test1.mp4
  - text: >-
      A man in a rolled-sleeve blue shirt and a woman in a white dress with a
      diamond necklace step onto a balcony. Holding hands, they gaze into each
      other’s eyes and share a romantic kiss.
    output:
      url: test2.mp4
  - text: >-
      The woman is in a black tactical suit and the man in scratched armor stand
      together. She steps closer, and he sets his hammer down, and and they
      share a passionate kiss under the twilight sky.
    output:
      url: test3.mp4
base_model: Wan-AI/Wan2.1-I2V-14B-480P
instance_prompt: null

Wan-I2V-LoRA-Kiss

Prompt
A woman in a shimmering red dress and dangling earrings stands beside a man in a dark blue shirt with a bear logo. She glances at him, their eyes lock, and they share a tender kiss.
Prompt
A man in a rolled-sleeve blue shirt and a woman in a white dress with a diamond necklace step onto a balcony. Holding hands, they gaze into each other’s eyes and share a romantic kiss.
Prompt
The woman is in a black tactical suit and the man in scratched armor stand together. She steps closer, and he sets his hammer down, and and they share a passionate kiss under the twilight sky.

Trainig details:

I tried a method where I first trained the LoRA on images only and then after a certain number of steps continued training only on videos.

  • LR: 1e-4
  • Optimizer: adamw_optimi
  • Max train epochs: 5
  • Steps: 30
  • Dataset: 81 vids in total (Used 6)
  • rank: 32
  • batch size: 1
  • gradient accumulation steps: 1

For training I used the Musubi-Tuner repo.

Inference:

It seems, the model likes long descriptive prompts. Look at the attached videos for prompt examples. Based on my tests if you use short prompts, the lora effect is weak.

Strength: 0.9-1.0