Update README.md (#27)
Browse files- Update README.md (a9476a60303425fd230e64d79c6f4f9f2d43c427)
README.md
CHANGED
@@ -37,11 +37,13 @@ library_name: MAGI-1
|
|
37 |
|
38 |
# MAGI-1: Autoregressive Video Generation at Scale
|
39 |
|
40 |
-
This repository contains the code for the MAGI-1 model, pre-trained weights and inference code. You can find more information on our [technical report](https://static.magi.world/static/files/MAGI_1.pdf) or directly create magic with MAGI-1 [here](http://sand.ai) . 🚀✨
|
41 |
|
42 |
|
43 |
## 🔥🔥🔥 Latest News
|
44 |
|
|
|
|
|
45 |
- Apr 21, 2025: MAGI-1 is here 🎉. We've released the model weights and inference code — check it out!
|
46 |
|
47 |
|
@@ -79,34 +81,41 @@ We adopt a shortcut distillation approach that trains a single velocity-based mo
|
|
79 |
|
80 |
We provide the pre-trained weights for MAGI-1, including the 24B and 4.5B models, as well as the corresponding distill and distill+quant models. The model weight links are shown in the table.
|
81 |
|
82 |
-
| Model | Link
|
83 |
-
|
|
84 |
-
| T5
|
85 |
-
| MAGI-1-VAE
|
86 |
-
| MAGI-1-24B
|
87 |
-
| MAGI-1-24B-distill
|
88 |
-
| MAGI-1-24B-distill+fp8_quant
|
89 |
-
| MAGI-1-4.5B
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
|
91 |
## 4. Evaluation
|
92 |
|
93 |
### In-house Human Evaluation
|
94 |
|
95 |
-
MAGI-1 achieves state-of-the-art performance among open-source models
|
96 |
|
97 |

|
98 |
|
99 |
### Physical Evaluation
|
100 |
|
101 |
-
Thanks to the natural advantages of autoregressive architecture, Magi achieves far superior precision in predicting physical behavior through video continuation—significantly outperforming all existing models.
|
102 |
|
103 |
| Model | Phys. IQ Score ↑ | Spatial IoU ↑ | Spatio Temporal ↑ | Weighted Spatial IoU ↑ | MSE ↓ |
|
104 |
|----------------|------------------|---------------|-------------------|-------------------------|--------|
|
105 |
| **V2V Models** | | | | | |
|
106 |
-
| **Magi (V2V)** | **56.02** | **0.367** | **0.270** | **0.304** | **0.005** |
|
|
|
107 |
| VideoPoet (V2V)| 29.50 | 0.204 | 0.164 | 0.137 | 0.010 |
|
108 |
| **I2V Models** | | | | | |
|
109 |
-
| **Magi (I2V)** | **30.23** | **0.203** | **0.151** | **0.154** | **0.012** |
|
110 |
| Kling1.6 (I2V) | 23.64 | 0.197 | 0.086 | 0.144 | 0.025 |
|
111 |
| VideoPoet (I2V)| 20.30 | 0.141 | 0.126 | 0.087 | 0.012 |
|
112 |
| Gen 3 (I2V) | 22.80 | 0.201 | 0.115 | 0.116 | 0.015 |
|
@@ -144,7 +153,7 @@ pip install -r requirements.txt
|
|
144 |
# Install ffmpeg
|
145 |
conda install -c conda-forge ffmpeg=4.4
|
146 |
|
147 |
-
#
|
148 |
git clone [email protected]:SandAI-org/MagiAttention.git
|
149 |
cd MagiAttention
|
150 |
git submodule update --init --recursive
|
@@ -198,6 +207,12 @@ By adjusting these parameters, you can flexibly control the input and output to
|
|
198 |
|
199 |
### Some Useful Configs (for config.json)
|
200 |
|
|
|
|
|
|
|
|
|
|
|
|
|
201 |
| Config | Help |
|
202 |
| -------------- | ------------------------------------------------------------ |
|
203 |
| seed | Random seed used for video generation |
|
@@ -205,7 +220,7 @@ By adjusting these parameters, you can flexibly control the input and output to
|
|
205 |
| video_size_w | Width of the video |
|
206 |
| num_frames | Controls the duration of generated video |
|
207 |
| fps | Frames per second, 4 video frames correspond to 1 latent_frame |
|
208 |
-
| cfg_number | Base model uses cfg_number==
|
209 |
| load | Directory containing a model checkpoint. |
|
210 |
| t5_pretrained | Path to load pretrained T5 model |
|
211 |
| vae_pretrained | Path to load pretrained VAE model |
|
@@ -230,4 +245,4 @@ If you find our code or model useful in your research, please cite:
|
|
230 |
|
231 |
## 8. Contact
|
232 |
|
233 |
-
If you have any questions, please feel free to raise an issue or contact us at [
|
|
|
37 |
|
38 |
# MAGI-1: Autoregressive Video Generation at Scale
|
39 |
|
40 |
+
This repository contains the [code](https://github.com/SandAI-org/MAGI-1) for the MAGI-1 model, pre-trained weights and inference code. You can find more information on our [technical report](https://static.magi.world/static/files/MAGI_1.pdf) or directly create magic with MAGI-1 [here](http://sand.ai) . 🚀✨
|
41 |
|
42 |
|
43 |
## 🔥🔥🔥 Latest News
|
44 |
|
45 |
+
- Apr 30, 2025: MAGI-1 4.5B distill and distill+quant models are coming soon 🎉 — we’re putting on the final touches, stay tuned!
|
46 |
+
- Apr 30, 2025: MAGI-1 4.5B model has been released 🎉. We've updated the model weights — check it out!
|
47 |
- Apr 21, 2025: MAGI-1 is here 🎉. We've released the model weights and inference code — check it out!
|
48 |
|
49 |
|
|
|
81 |
|
82 |
We provide the pre-trained weights for MAGI-1, including the 24B and 4.5B models, as well as the corresponding distill and distill+quant models. The model weight links are shown in the table.
|
83 |
|
84 |
+
| Model | Link | Recommend Machine |
|
85 |
+
| ------------------------------ | -------------------------------------------------------------------- | ------------------------------- |
|
86 |
+
| T5 | [T5](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/t5) | - |
|
87 |
+
| MAGI-1-VAE | [MAGI-1-VAE](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/vae) | - |
|
88 |
+
| MAGI-1-24B | [MAGI-1-24B](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/24B_base) | H100/H800 × 8 |
|
89 |
+
| MAGI-1-24B-distill | [MAGI-1-24B-distill](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/24B_distill) | H100/H800 × 8 |
|
90 |
+
| MAGI-1-24B-distill+fp8_quant | [MAGI-1-24B-distill+quant](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/24B_distill_quant) | H100/H800 × 4 or RTX 4090 × 8 |
|
91 |
+
| MAGI-1-4.5B | [MAGI-1-4.5B](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/4.5B_base) | RTX 4090 × 1 |
|
92 |
+
| MAGI-1-4.5B-distill | Coming soon | RTX 4090 × 1 |
|
93 |
+
| MAGI-1-4.5B-distill+fp8_quant | Coming soon | RTX 4090 × 1 |
|
94 |
+
|
95 |
+
> [!NOTE]
|
96 |
+
>
|
97 |
+
> For 4.5B models, any machine with at least 24GB of GPU memory is sufficient.
|
98 |
|
99 |
## 4. Evaluation
|
100 |
|
101 |
### In-house Human Evaluation
|
102 |
|
103 |
+
MAGI-1 achieves state-of-the-art performance among open-source models like Wan-2.1 and HunyuanVideo and closed-source model like Hailuo (i2v-01), particularly excelling in instruction following and motion quality, positioning it as a strong potential competitor to closed-source commercial models such as Kling.
|
104 |
|
105 |

|
106 |
|
107 |
### Physical Evaluation
|
108 |
|
109 |
+
Thanks to the natural advantages of autoregressive architecture, Magi achieves far superior precision in predicting physical behavior on the [Physics-IQ benchmark](https://github.com/google-deepmind/physics-IQ-benchmark) through video continuation—significantly outperforming all existing models.
|
110 |
|
111 |
| Model | Phys. IQ Score ↑ | Spatial IoU ↑ | Spatio Temporal ↑ | Weighted Spatial IoU ↑ | MSE ↓ |
|
112 |
|----------------|------------------|---------------|-------------------|-------------------------|--------|
|
113 |
| **V2V Models** | | | | | |
|
114 |
+
| **Magi-24B (V2V)** | **56.02** | **0.367** | **0.270** | **0.304** | **0.005** |
|
115 |
+
| **Magi-4.5B (V2V)** | **42.44** | **0.234** | **0.285** | **0.188** | **0.007** |
|
116 |
| VideoPoet (V2V)| 29.50 | 0.204 | 0.164 | 0.137 | 0.010 |
|
117 |
| **I2V Models** | | | | | |
|
118 |
+
| **Magi-24B (I2V)** | **30.23** | **0.203** | **0.151** | **0.154** | **0.012** |
|
119 |
| Kling1.6 (I2V) | 23.64 | 0.197 | 0.086 | 0.144 | 0.025 |
|
120 |
| VideoPoet (I2V)| 20.30 | 0.141 | 0.126 | 0.087 | 0.012 |
|
121 |
| Gen 3 (I2V) | 22.80 | 0.201 | 0.115 | 0.116 | 0.015 |
|
|
|
153 |
# Install ffmpeg
|
154 |
conda install -c conda-forge ffmpeg=4.4
|
155 |
|
156 |
+
# For GPUs based on the Hopper architecture (e.g., H100/H800), it is recommended to install MagiAttention(https://github.com/SandAI-org/MagiAttention) for acceleration. For non-Hopper GPUs, installing MagiAttention is not necessary.
|
157 |
git clone [email protected]:SandAI-org/MagiAttention.git
|
158 |
cd MagiAttention
|
159 |
git submodule update --init --recursive
|
|
|
207 |
|
208 |
### Some Useful Configs (for config.json)
|
209 |
|
210 |
+
> [!NOTE]
|
211 |
+
>
|
212 |
+
> - If you are running 24B model with RTX 4090 \* 8, please set `pp_size:2 cp_size: 4`.
|
213 |
+
>
|
214 |
+
> - Our model supports arbitrary resolutions. To accelerate inference process, the default resolution for the 4.5B model is set to 720×720 in the `4.5B_config.json`.
|
215 |
+
|
216 |
| Config | Help |
|
217 |
| -------------- | ------------------------------------------------------------ |
|
218 |
| seed | Random seed used for video generation |
|
|
|
220 |
| video_size_w | Width of the video |
|
221 |
| num_frames | Controls the duration of generated video |
|
222 |
| fps | Frames per second, 4 video frames correspond to 1 latent_frame |
|
223 |
+
| cfg_number | Base model uses cfg_number==3, distill and quant model uses cfg_number=1 |
|
224 |
| load | Directory containing a model checkpoint. |
|
225 |
| t5_pretrained | Path to load pretrained T5 model |
|
226 |
| vae_pretrained | Path to load pretrained VAE model |
|
|
|
245 |
|
246 |
## 8. Contact
|
247 |
|
248 |
+
If you have any questions, please feel free to raise an issue or contact us at [research@sand.ai](mailto:research@sand.ai) .
|