Update README.md
Browse files
README.md
CHANGED
@@ -78,7 +78,7 @@ This model used weights pretrained by [lxj616](https://huggingface.co/lxj616/mak
|
|
78 |
* Each video latent is encoded into latent representations of the shape 4 x 24 x H/8 x W/8
|
79 |
* The latent of the first frame from each video is repeated along the frame dimension as additional guidance (referred to as hint image)
|
80 |
* Hint latent and video latent are stacked to produce a shape of 8 x 24 x H/8 x W/8
|
81 |
-
* The last input channel is preserved for
|
82 |
* Text prompts are encoded by the CLIP text encoder
|
83 |
* Video latents with added noise and clip encoded text prompts are fed into the UNet to predict the added noise
|
84 |
* Loss is the reconstruction objective between the added noise and the predicted noise via mean squared error (mse/l2)
|
@@ -114,7 +114,7 @@ Trainig statistics are available at [Weights and Biases](https://wandb.ai/tempof
|
|
114 |
```bibtext
|
115 |
@misc{TempoFunk2023,
|
116 |
author = {Lopho, Carlos Chavez},
|
117 |
-
title = {TempoFunk: Extending
|
118 |
url = {https://github.com/lopho/makeavid-sd-tpu},
|
119 |
month = {5},
|
120 |
year = {2023}
|
|
|
78 |
* Each video latent is encoded into latent representations of the shape 4 x 24 x H/8 x W/8
|
79 |
* The latent of the first frame from each video is repeated along the frame dimension as additional guidance (referred to as hint image)
|
80 |
* Hint latent and video latent are stacked to produce a shape of 8 x 24 x H/8 x W/8
|
81 |
+
* The last input channel is preserved for masking purposes (not used during training, set to zero)
|
82 |
* Text prompts are encoded by the CLIP text encoder
|
83 |
* Video latents with added noise and clip encoded text prompts are fed into the UNet to predict the added noise
|
84 |
* Loss is the reconstruction objective between the added noise and the predicted noise via mean squared error (mse/l2)
|
|
|
114 |
```bibtext
|
115 |
@misc{TempoFunk2023,
|
116 |
author = {Lopho, Carlos Chavez},
|
117 |
+
title = {TempoFunk: Extending latent diffusion image models to Video},
|
118 |
url = {https://github.com/lopho/makeavid-sd-tpu},
|
119 |
month = {5},
|
120 |
year = {2023}
|