sand-ai
/

MAGI-1

@@ -37,11 +37,13 @@ library_name: MAGI-1
 # MAGI-1: Autoregressive Video Generation at Scale
-This repository contains the code for the MAGI-1 model, pre-trained weights and inference code. You can find more information on our [technical report](https://static.magi.world/static/files/MAGI_1.pdf) or directly create magic with MAGI-1 [here](http://sand.ai) . 🚀✨
 ## 🔥🔥🔥 Latest News
 - Apr 21, 2025: MAGI-1 is here 🎉. We've released the model weights and inference code — check it out!
@@ -79,34 +81,41 @@ We adopt a shortcut distillation approach that trains a single velocity-based mo
 We provide the pre-trained weights for MAGI-1, including the 24B and 4.5B models, as well as the corresponding distill and distill+quant models. The model weight links are shown in the table.
-| Model                         | Link                                                         | Recommend Machine               |
-| ----------------------------- | ------------------------------------------------------------ | ------------------------------- |
-| T5 | [T5](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/t5) | - |
-| MAGI-1-VAE  | [MAGI-1-VAE](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/vae) | - |
-| MAGI-1-24B                    | [MAGI-1-24B](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/24B_base)       | H100/H800 \* 8                  |
-| MAGI-1-24B-distill            | [MAGI-1-24B-distill](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/24B_distill) | H100/H800 \* 8                  |
-| MAGI-1-24B-distill+fp8_quant  | [MAGI-1-24B-distill+quant](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/24B_distill_quant) | H100/H800 \* 4 or RTX 4090 \* 8 |
-| MAGI-1-4.5B                   | MAGI-1-4.5B      | RTX 4090 \* 1                   |
 ## 4. Evaluation
 ### In-house Human Evaluation
-MAGI-1 achieves state-of-the-art performance among open-source models (surpassing Wan-2.1 and significantly outperforming Hailuo and HunyuanVideo), particularly excelling in instruction following and motion quality, positioning it as a strong potential competitor to closed-source commercial models such as Kling.
 ![inhouse human evaluation](figures/inhouse_human_evaluation.png)
 ### Physical Evaluation
-Thanks to the natural advantages of autoregressive architecture, Magi achieves far superior precision in predicting physical behavior through video continuation—significantly outperforming all existing models.
 | Model          | Phys. IQ Score ↑ | Spatial IoU ↑ | Spatio Temporal ↑ | Weighted Spatial IoU ↑ | MSE ↓  |
 |----------------|------------------|---------------|-------------------|-------------------------|--------|
 | **V2V Models** |                  |               |                   |                         |        |
-| **Magi (V2V)** | **56.02**        | **0.367**     | **0.270**         | **0.304**               | **0.005** |
 | VideoPoet (V2V)| 29.50            | 0.204         | 0.164             | 0.137                   | 0.010  |
 | **I2V Models** |                  |               |                   |                         |        |
-| **Magi (I2V)** | **30.23**        | **0.203**     | **0.151**         | **0.154**               | **0.012** |
 | Kling1.6 (I2V) | 23.64            | 0.197         | 0.086             | 0.144                   | 0.025  |
 | VideoPoet (I2V)| 20.30            | 0.141         | 0.126             | 0.087                   | 0.012  |
 | Gen 3 (I2V)    | 22.80            | 0.201         | 0.115             | 0.116                   | 0.015  |
@@ -144,7 +153,7 @@ pip install -r requirements.txt
 # Install ffmpeg
 conda install -c conda-forge ffmpeg=4.4
-# Install MagiAttention, for more information, please refer to https://github.com/SandAI-org/MagiAttention#
 git clone [email protected]:SandAI-org/MagiAttention.git
 cd MagiAttention
 git submodule update --init --recursive
@@ -198,6 +207,12 @@ By adjusting these parameters, you can flexibly control the input and output to
 ### Some Useful Configs (for config.json)
 | Config         | Help                                                         |
 | -------------- | ------------------------------------------------------------ |
 | seed           | Random seed used for video generation                        |
@@ -205,7 +220,7 @@ By adjusting these parameters, you can flexibly control the input and output to
 | video_size_w   | Width of the video                                           |
 | num_frames     | Controls the duration of generated video                     |
 | fps            | Frames per second, 4 video frames correspond to 1 latent_frame |
-| cfg_number     | Base model uses cfg_number==2, distill and quant model uses cfg_number=1 |
 | load           | Directory containing a model checkpoint.                     |
 | t5_pretrained  | Path to load pretrained T5 model                             |
 | vae_pretrained | Path to load pretrained VAE model                            |
@@ -230,4 +245,4 @@ If you find our code or model useful in your research, please cite:
 ## 8. Contact
-If you have any questions, please feel free to raise an issue or contact us at [support@sand.ai](support@sand.ai) .

 # MAGI-1: Autoregressive Video Generation at Scale
+This repository contains the [code](https://github.com/SandAI-org/MAGI-1) for the MAGI-1 model, pre-trained weights and inference code. You can find more information on our [technical report](https://static.magi.world/static/files/MAGI_1.pdf) or directly create magic with MAGI-1 [here](http://sand.ai) . 🚀✨
 ## 🔥🔥🔥 Latest News
+- Apr 30, 2025: MAGI-1 4.5B distill and distill+quant models are coming soon 🎉 — we’re putting on the final touches, stay tuned!
+- Apr 30, 2025: MAGI-1 4.5B model has been released 🎉. We've updated the model weights — check it out!
 - Apr 21, 2025: MAGI-1 is here 🎉. We've released the model weights and inference code — check it out!
 We provide the pre-trained weights for MAGI-1, including the 24B and 4.5B models, as well as the corresponding distill and distill+quant models. The model weight links are shown in the table.
+| Model                         | Link                                                                 | Recommend Machine             |
+| ------------------------------ | -------------------------------------------------------------------- | ------------------------------- |
+| T5                             | [T5](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/t5)        | -                               |
+| MAGI-1-VAE                     | [MAGI-1-VAE](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/vae) | -                               |
+| MAGI-1-24B                     | [MAGI-1-24B](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/24B_base) | H100/H800 × 8                   |
+| MAGI-1-24B-distill              | [MAGI-1-24B-distill](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/24B_distill) | H100/H800 × 8                   |
+| MAGI-1-24B-distill+fp8_quant    | [MAGI-1-24B-distill+quant](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/24B_distill_quant) | H100/H800 × 4 or RTX 4090 × 8    |
+| MAGI-1-4.5B                    | [MAGI-1-4.5B](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/4.5B_base) | RTX 4090 × 1                    |
+| MAGI-1-4.5B-distill             | Coming soon                                                         | RTX 4090 × 1                    |
+| MAGI-1-4.5B-distill+fp8_quant   | Coming soon                                                         | RTX 4090 × 1                    |
+> [!NOTE]
+>
+> For 4.5B models, any machine with at least 24GB of GPU memory is sufficient.
 ## 4. Evaluation
 ### In-house Human Evaluation
+MAGI-1 achieves state-of-the-art performance among open-source models like Wan-2.1 and HunyuanVideo and closed-source model like Hailuo (i2v-01), particularly excelling in instruction following and motion quality, positioning it as a strong potential competitor to closed-source commercial models such as Kling.
 ![inhouse human evaluation](figures/inhouse_human_evaluation.png)
 ### Physical Evaluation
+Thanks to the natural advantages of autoregressive architecture, Magi achieves far superior precision in predicting physical behavior on the [Physics-IQ benchmark](https://github.com/google-deepmind/physics-IQ-benchmark) through video continuation—significantly outperforming all existing models.
 | Model          | Phys. IQ Score ↑ | Spatial IoU ↑ | Spatio Temporal ↑ | Weighted Spatial IoU ↑ | MSE ↓  |
 |----------------|------------------|---------------|-------------------|-------------------------|--------|
 | **V2V Models** |                  |               |                   |                         |        |
+| **Magi-24B (V2V)** | **56.02**        | **0.367**     | **0.270**         | **0.304**               | **0.005** |
+| **Magi-4.5B (V2V)** | **42.44**        | **0.234**     | **0.285**         | **0.188**               | **0.007** |
 | VideoPoet (V2V)| 29.50            | 0.204         | 0.164             | 0.137                   | 0.010  |
 | **I2V Models** |                  |               |                   |                         |        |
+| **Magi-24B (I2V)** | **30.23**        | **0.203**     | **0.151**         | **0.154**               | **0.012** |
 | Kling1.6 (I2V) | 23.64            | 0.197         | 0.086             | 0.144                   | 0.025  |
 | VideoPoet (I2V)| 20.30            | 0.141         | 0.126             | 0.087                   | 0.012  |
 | Gen 3 (I2V)    | 22.80            | 0.201         | 0.115             | 0.116                   | 0.015  |
 # Install ffmpeg
 conda install -c conda-forge ffmpeg=4.4
+# For GPUs based on the Hopper architecture (e.g., H100/H800), it is recommended to install MagiAttention(https://github.com/SandAI-org/MagiAttention) for acceleration. For non-Hopper GPUs, installing MagiAttention is not necessary.
 git clone [email protected]:SandAI-org/MagiAttention.git
 cd MagiAttention
 git submodule update --init --recursive
 ### Some Useful Configs (for config.json)
+> [!NOTE]
+>
+> - If you are running 24B model with RTX 4090 \* 8, please set `pp_size:2 cp_size: 4`.
+>
+> - Our model supports arbitrary resolutions. To accelerate inference process, the default resolution for the 4.5B model is set to 720×720 in the `4.5B_config.json`.
 | Config         | Help                                                         |
 | -------------- | ------------------------------------------------------------ |
 | seed           | Random seed used for video generation                        |
 | video_size_w   | Width of the video                                           |
 | num_frames     | Controls the duration of generated video                     |
 | fps            | Frames per second, 4 video frames correspond to 1 latent_frame |
+| cfg_number     | Base model uses cfg_number==3, distill and quant model uses cfg_number=1 |
 | load           | Directory containing a model checkpoint.                     |
 | t5_pretrained  | Path to load pretrained T5 model                             |
 | vae_pretrained | Path to load pretrained VAE model                            |
 ## 8. Contact
+If you have any questions, please feel free to raise an issue or contact us at [research@sand.ai](mailto:research@sand.ai) .