TempoFunk
/

makeavid-sd-jax

StableDiffusionPseudo3DPipeline

jax-diffusers-event

Model card Files Files and versions Community

lopho commited on May 9, 2023

Commit

371d561

·

1 Parent(s): fc5cfe4

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -78,7 +78,7 @@ This model used weights pretrained by [lxj616](https://huggingface.co/lxj616/mak
 * Each video latent is encoded into latent representations of the shape 4 x 24 x H/8 x W/8
 * The latent of the first frame from each video is repeated along the frame dimension as additional guidance (referred to as hint image)
 * Hint latent and video latent are stacked to produce a shape of 8 x 24 x H/8 x W/8
-* The last input channel is preserved for maskin purposes (not used during training, set to zero)
 * Text prompts are encoded by the CLIP text encoder
 * Video latents with added noise and clip encoded text prompts are fed into the UNet to predict the added noise
 * Loss is the reconstruction objective between the added noise and the predicted noise via mean squared error (mse/l2)
@@ -114,7 +114,7 @@ Trainig statistics are available at [Weights and Biases](https://wandb.ai/tempof
 ```bibtext
 @misc{TempoFunk2023,
       author = {Lopho, Carlos Chavez},
-      title = {TempoFunk: Extending LDM models to Video},
       url = {https://github.com/lopho/makeavid-sd-tpu},
       month = {5},
       year = {2023}

 * Each video latent is encoded into latent representations of the shape 4 x 24 x H/8 x W/8
 * The latent of the first frame from each video is repeated along the frame dimension as additional guidance (referred to as hint image)
 * Hint latent and video latent are stacked to produce a shape of 8 x 24 x H/8 x W/8
+* The last input channel is preserved for masking purposes (not used during training, set to zero)
 * Text prompts are encoded by the CLIP text encoder
 * Video latents with added noise and clip encoded text prompts are fed into the UNet to predict the added noise
 * Loss is the reconstruction objective between the added noise and the predicted noise via mean squared error (mse/l2)
 ```bibtext
 @misc{TempoFunk2023,
       author = {Lopho, Carlos Chavez},
+      title = {TempoFunk: Extending latent diffusion image models to Video},
       url = {https://github.com/lopho/makeavid-sd-tpu},
       month = {5},
       year = {2023}