Add model card for LPD

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +94 -0
README.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: unconditional-image-generation
3
+ library_name: transformers
4
+ license: apache-2.0
5
+ ---
6
+
7
+ # Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
8
+
9
+ This repository contains the models presented in the paper [Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation](https://huggingface.co/papers/2507.01957).
10
+
11
+ Code: [https://github.com/mit-han-lab/lpd](https://github.com/mit-han-lab/lpd)
12
+
13
+ <p align="left">
14
+ <img src="https://github.com/mit-han-lab/lpd/raw/main/assets/vis.png" width="1200" alt="LPD Visualizations">
15
+ </p>
16
+
17
+ ## Abstract
18
+
19
+ We present *Locality-aware Parallel Decoding* (LPD) to accelerate autoregressive image generation. Traditional autoregressive image generation relies on next-patch prediction, a memory-bound process that leads to high latency. Existing works have tried to parallelize next-patch prediction by shifting to multi-patch prediction to accelerate the process, but only achieved limited parallelization. To achieve high parallelization while maintaining generation quality, we introduce two key techniques: (1) **Flexible Parallelized Autoregressive Modeling**, a novel architecture that enables arbitrary generation ordering and degrees of parallelization. It uses learnable position query tokens to guide generation at target positions while ensuring mutual visibility among concurrently generated tokens for consistent parallel decoding. (2) **Locality-aware Generation Ordering**, a novel schedule that forms groups to minimize intra-group dependencies and maximize contextual support, enhancing generation quality. With these designs, we reduce the generation steps from 256 to 20 (256$\times$256 res.) and 1024 to 48 (512$\times$512 res.) without compromising quality on the ImageNet class-conditional generation, and achieving at least 3.4$\times$ lower latency than previous parallelized autoregressive models.
20
+
21
+ <p align="left">
22
+ <img src="https://github.com/mit-han-lab/lpd/raw/main/assets/speedup.png" width="450" alt="LPD Speedup">
23
+ </p>
24
+
25
+ ## News
26
+
27
+ * **[2025/07] 🔥** We release the code and [models](https://huggingface.co/collections/mit-han-lab/lpd-68658dde87750bacd791e91c) for LPD!
28
+
29
+ ## Preparation
30
+
31
+ ### Environment Setup
32
+
33
+ ```bash
34
+ git clone https://github.com/mit-han-lab/lpd
35
+ cd lpd
36
+ bash environment_setup.sh lpd
37
+ ```
38
+
39
+ ### Models
40
+
41
+ Download the [LlamaGen tokenizer](https://huggingface.co/FoundationVision/LlamaGen/resolve/main/vq_ds16_c2i.pt) and place it in `tokenizers`. Download LPD [models](https://huggingface.co/collections/mit-han-lab/lpd-68658dde87750bacd791e91c) from Huggingface.
42
+
43
+ | Model | #Para. | #Steps | FID-50K | IS | Latency(s) | Throughput(img/s) |
44
+ | :------------------------------------------------------------------------- | :----- | :----- | :------ | :-- | :--------- | :---------------- |
45
+ | [`LPD-L-256`](https://huggingface.co/mit-han-lab/lpd_l_256/tree/main) | 337M | 20 | 2.40 | 284.5 | 0.28 | 139.11 |
46
+ | [`LPD-XL-256`](https://huggingface.co/mit-han-lab/lpd_xl_256/tree/main) | 752M | 20 | 2.10 | 326.7 | 0.41 | 75.20 |
47
+ | [`LPD-XXL-256`](https://huggingface.co/mit-han-lab/lpd_xxl_256/tree/main) | 1.4B | 20 | 2.00 | 337.6 | 0.55 | 45.07 |
48
+ | [`LPD-L-256`](https://huggingface.co/mit-han-lab/lpd_l_256/tree/main) | 337M | 32 | 2.29 | 282.7 | 0.46 | 110.34 |
49
+ | [`LPD-XL-256`](https://huggingface.co/mit-han-lab/lpd_xl_256/tree/main) | 752M | 32 | 1.92 | 319.4 | 0.66 | 61.24 |
50
+ | [`LPD-L-512`](https://huggingface.co/mit-han-lab/lpd_l_512/tree/main) | 337M | 48 | 2.54 | 292.2 | 0.69 | 35.16 |
51
+ | [`LPD-XL-512`](https://huggingface.co/mit-han-lab/lpd_xl_512/tree/main) | 752M | 48 | 2.10 | 326.0 | 1.01 | 18.18 |
52
+
53
+ ### Dataset
54
+
55
+ If you conduct training, please download [ImageNet](http://image-net.org/download) dataset and place it in your `IMAGENET_PATH`. To accelerate training, we recommend precomputing the tokenizer latents and saving them to `CACHED_PATH`. Please set the `--img_size` to either 256 or 512.
56
+
57
+ ```bash
58
+ torchrun --nproc_per_node=8 --nnodes=1 \
59
+ main_cache.py \
60
+ --img_size 256 --vqgan_path tokenizers/vq_ds16_c2i.pt \
61
+ --data_path ${IMAGENET_PATH} --cached_path ${CACHED_PATH}
62
+ ```
63
+
64
+ ## Usage
65
+
66
+ ### Evaluation
67
+
68
+ First, generate the LPD orders. Alternatively, you may [download](https://huggingface.co/mit-han-lab/lpd_orders/tree/main) the pre-generated orders and place them in `orders/lpd_orders_generated`.
69
+
70
+ ```bash
71
+ bash orders/run_lpd_order.sh
72
+ ```
73
+
74
+ Then, run the evaluation scripts located in `scripts/eval`. For example, to evaluate LPD-L-256 using 20 steps:
75
+
76
+ ```bash
77
+ bash scripts/eval/lpd_l_res256_steps20.sh
78
+ ```
79
+
80
+ Note: Please set `--pretrained_ckpt` to the path of the downloaded LPD model, and specify `--output_dir`.
81
+
82
+ ### Training
83
+
84
+ Run the training scripts located in `scripts/train`. For example, to train LPD-L-256:
85
+
86
+ ```bash
87
+ python scripts/cli/run.py -J lpd_l_256 -p your_slurm_partition -A your_slurm_account -N 4 bash scripts/train/lpd_l_256.sh
88
+ ```
89
+
90
+ ### Acknowledgements
91
+
92
+ Thanks to [MAR](https://github.com/LTH14/mar/tree/main) for the wonderful open-source codebase.
93
+
94
+ We thank MIT-IBM Watson AI Lab, National Science Foundation, Hyundai, and Amazon for supporting this research.