Update README.md (#23)
Browse files- Update README.md (ab0b68296e5235090026c6bab19471537fc71547)
Co-authored-by: Zhicheng Sun <[email protected]>
README.md
CHANGED
@@ -7,13 +7,15 @@ base_model:
|
|
7 |
pipeline_tag: text-to-video
|
8 |
tags:
|
9 |
- image-to-video
|
|
|
|
|
10 |
---
|
11 |
|
12 |
-
# ⚡️Pyramid Flow⚡️
|
13 |
|
14 |
-
[[Paper]](https://arxiv.org/abs/2410.05954) [[Project Page ✨]](https://pyramid-flow.github.io) [[Code 🚀]](https://github.com/jy0205/Pyramid-Flow) [[demo 🤗](https://huggingface.co/spaces/Pyramid-Flow/pyramid-flow)]
|
15 |
|
16 |
-
This is the
|
17 |
|
18 |
<table class="center" border="0" style="width: 100%; text-align: left;">
|
19 |
<tr>
|
@@ -28,10 +30,15 @@ This is the official repository for Pyramid Flow, a training-efficient **Autoreg
|
|
28 |
</tr>
|
29 |
</table>
|
30 |
|
|
|
31 |
## News
|
32 |
|
33 |
-
* `
|
|
|
|
|
|
|
34 |
* `2024.10.11` 🤗🤗🤗 [Hugging Face demo](https://huggingface.co/spaces/Pyramid-Flow/pyramid-flow) is available. Thanks [@multimodalart](https://huggingface.co/multimodalart) for the commit!
|
|
|
35 |
* `2024.10.10` 🚀🚀🚀 We release the [technical report](https://arxiv.org/abs/2410.05954), [project page](https://pyramid-flow.github.io) and [model checkpoint](https://huggingface.co/rain1011/pyramid-flow-sd3) of Pyramid Flow.
|
36 |
|
37 |
## Installation
|
@@ -48,7 +55,7 @@ conda activate pyramid
|
|
48 |
pip install -r requirements.txt
|
49 |
```
|
50 |
|
51 |
-
Then,
|
52 |
|
53 |
```python
|
54 |
from huggingface_hub import snapshot_download
|
@@ -59,7 +66,9 @@ snapshot_download("rain1011/pyramid-flow-sd3", local_dir=model_path, local_dir_u
|
|
59 |
|
60 |
## Usage
|
61 |
|
62 |
-
|
|
|
|
|
63 |
|
64 |
```python
|
65 |
import torch
|
@@ -76,10 +85,13 @@ model = PyramidDiTForVideoGeneration(
|
|
76 |
model_variant='diffusion_transformer_768p', # 'diffusion_transformer_384p'
|
77 |
)
|
78 |
|
79 |
-
model.vae.to("cuda")
|
80 |
-
model.dit.to("cuda")
|
81 |
-
model.text_encoder.to("cuda")
|
82 |
model.vae.enable_tiling()
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
```
|
84 |
|
85 |
Then, you can try text-to-video generation on your own prompts:
|
@@ -124,8 +136,6 @@ with torch.no_grad(), torch.cuda.amp.autocast(enabled=True, dtype=torch_dtype):
|
|
124 |
export_to_video(frames, "./image_to_video_sample.mp4", fps=24)
|
125 |
```
|
126 |
|
127 |
-
We also support CPU offloading to allow inference with **less than 12GB** of GPU memory by adding a `cpu_offloading=True` parameter. This feature was contributed by [@Ednaordinary](https://github.com/Ednaordinary), see [#23](https://github.com/jy0205/Pyramid-Flow/pull/23) for details.
|
128 |
-
|
129 |
## Usage tips
|
130 |
|
131 |
* The `guidance_scale` parameter controls the visual quality. We suggest using a guidance within [7, 9] for the 768p checkpoint during text-to-video generation, and 7 for the 384p checkpoint.
|
@@ -147,6 +157,7 @@ The following video examples are generated at 5s, 768p, 24fps. For more results,
|
|
147 |
</tr>
|
148 |
</table>
|
149 |
|
|
|
150 |
## Acknowledgement
|
151 |
|
152 |
We are grateful for the following awesome projects when implementing Pyramid Flow:
|
@@ -160,6 +171,7 @@ We are grateful for the following awesome projects when implementing Pyramid Flo
|
|
160 |
## Citation
|
161 |
|
162 |
Consider giving this repository a star and cite Pyramid Flow in your publications if it helps your research.
|
|
|
163 |
```
|
164 |
@article{jin2024pyramidal,
|
165 |
title={Pyramidal Flow Matching for Efficient Video Generative Modeling},
|
|
|
7 |
pipeline_tag: text-to-video
|
8 |
tags:
|
9 |
- image-to-video
|
10 |
+
- sd3
|
11 |
+
|
12 |
---
|
13 |
|
14 |
+
# ⚡️Pyramid Flow SD3⚡️
|
15 |
|
16 |
+
[[Paper]](https://arxiv.org/abs/2410.05954) [[Project Page ✨]](https://pyramid-flow.github.io) [[Code 🚀]](https://github.com/jy0205/Pyramid-Flow) [[miniFLUX Model ⚡️]](https://huggingface.co/rain1011/pyramid-flow-miniflux) [[demo 🤗](https://huggingface.co/spaces/Pyramid-Flow/pyramid-flow)]
|
17 |
|
18 |
+
This is the model repository for Pyramid Flow, a training-efficient **Autoregressive Video Generation** method based on **Flow Matching**. By training only on open-source datasets, it generates high-quality 10-second videos at 768p resolution and 24 FPS, and naturally supports image-to-video generation.
|
19 |
|
20 |
<table class="center" border="0" style="width: 100%; text-align: left;">
|
21 |
<tr>
|
|
|
30 |
</tr>
|
31 |
</table>
|
32 |
|
33 |
+
|
34 |
## News
|
35 |
|
36 |
+
* `2024.10.29` ⚡️⚡️⚡️ We release [training code](https://github.com/jy0205/Pyramid-Flow?tab=readme-ov-file#training) and [new model checkpoints](https://huggingface.co/rain1011/pyramid-flow-miniflux) with FLUX structure trained from scratch.
|
37 |
+
|
38 |
+
> We have switched the model structure from SD3 to a mini FLUX to fix human structure issues, please try our 1024p image checkpoint and 384p video checkpoint. We will release 768p video checkpoint in a few days.
|
39 |
+
|
40 |
* `2024.10.11` 🤗🤗🤗 [Hugging Face demo](https://huggingface.co/spaces/Pyramid-Flow/pyramid-flow) is available. Thanks [@multimodalart](https://huggingface.co/multimodalart) for the commit!
|
41 |
+
|
42 |
* `2024.10.10` 🚀🚀🚀 We release the [technical report](https://arxiv.org/abs/2410.05954), [project page](https://pyramid-flow.github.io) and [model checkpoint](https://huggingface.co/rain1011/pyramid-flow-sd3) of Pyramid Flow.
|
43 |
|
44 |
## Installation
|
|
|
55 |
pip install -r requirements.txt
|
56 |
```
|
57 |
|
58 |
+
Then, download the model from [Huggingface](https://huggingface.co/rain1011) (there are two variants: [miniFLUX](https://huggingface.co/rain1011/pyramid-flow-miniflux) or [SD3](https://huggingface.co/rain1011/pyramid-flow-sd3)). The miniFLUX models support 1024p image and 384p video generation, and the SD3-based models support 768p and 384p video generation. The 384p checkpoint generates 5-second video at 24FPS, while the 768p checkpoint generates up to 10-second video at 24FPS.
|
59 |
|
60 |
```python
|
61 |
from huggingface_hub import snapshot_download
|
|
|
66 |
|
67 |
## Usage
|
68 |
|
69 |
+
For inference, we provide Gradio demo, single-GPU, multi-GPU, and Apple Silicon inference code, as well as VRAM-efficient features such as CPU offloading. Please check our [code repository](https://github.com/jy0205/Pyramid-Flow?tab=readme-ov-file#inference) for usage.
|
70 |
+
|
71 |
+
Below is a simplified two-step usage procedure. First, load the downloaded model:
|
72 |
|
73 |
```python
|
74 |
import torch
|
|
|
85 |
model_variant='diffusion_transformer_768p', # 'diffusion_transformer_384p'
|
86 |
)
|
87 |
|
|
|
|
|
|
|
88 |
model.vae.enable_tiling()
|
89 |
+
# model.vae.to("cuda")
|
90 |
+
# model.dit.to("cuda")
|
91 |
+
# model.text_encoder.to("cuda")
|
92 |
+
|
93 |
+
# if you're not using sequential offloading bellow uncomment the lines above ^
|
94 |
+
model.enable_sequential_cpu_offload()
|
95 |
```
|
96 |
|
97 |
Then, you can try text-to-video generation on your own prompts:
|
|
|
136 |
export_to_video(frames, "./image_to_video_sample.mp4", fps=24)
|
137 |
```
|
138 |
|
|
|
|
|
139 |
## Usage tips
|
140 |
|
141 |
* The `guidance_scale` parameter controls the visual quality. We suggest using a guidance within [7, 9] for the 768p checkpoint during text-to-video generation, and 7 for the 384p checkpoint.
|
|
|
157 |
</tr>
|
158 |
</table>
|
159 |
|
160 |
+
|
161 |
## Acknowledgement
|
162 |
|
163 |
We are grateful for the following awesome projects when implementing Pyramid Flow:
|
|
|
171 |
## Citation
|
172 |
|
173 |
Consider giving this repository a star and cite Pyramid Flow in your publications if it helps your research.
|
174 |
+
|
175 |
```
|
176 |
@article{jin2024pyramidal,
|
177 |
title={Pyramidal Flow Matching for Efficient Video Generative Modeling},
|