Image-to-Image
Diffusers

Improve model card with metadata, links, and structure

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +55 -3
README.md CHANGED
@@ -1,3 +1,55 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: image-to-image
4
+ library_name: diffusers
5
+ ---
6
+
7
+ # REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
8
+
9
+ <p align="center">
10
+ <a href="https://scholar.google.com.au/citations?user=GQzvqS4AAAAJ" target="_blank">Xingjian&nbsp;Leng</a><sup>1*</sup> &ensp; <b>&middot;</b> &ensp;
11
+ <a href="https://1jsingh.github.io/" target="_blank">Jaskirat&nbsp;Singh</a><sup>1*</sup> &ensp; <b>&middot;</b> &ensp;
12
+ <a href="https://hou-yz.github.io/" target="_blank">Yunzhong&nbsp;Hou</a><sup>1</sup> &ensp; <b>&middot;</b> &ensp;
13
+ <a href="https://people.csiro.au/X/Z/Zhenchang-Xing/" target="_blank">Zhenchang&nbsp;Xing</a><sup>2</sup>&ensp; <b>&middot;</b> &ensp;
14
+ <a href="https://www.sainingxie.com/" target="_blank">Saining&nbsp;Xie</a><sup>3</sup>&ensp; <b>&middot;</b> &ensp;
15
+ <a href="https://zheng-lab-anu.github.io/" target="_blank">Liang&nbsp;Zheng</a><sup>1</sup>&ensp;
16
+ </p>
17
+
18
+ <p align="center">
19
+ <sup>1</sup> Australian National University &emsp; <sup>2</sup>Data61-CSIRO &emsp; <sup>3</sup>New York University &emsp; <br>
20
+ <sub><sup>*</sup>Project Leads &emsp;</sub>
21
+ </p>
22
+
23
+ [REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers](https://arxiv.org/abs/2504.10483)
24
+
25
+ Project Page: [https://end2end-diffusion.github.io](https://end2end-diffusion.github.io)
26
+
27
+ Code: [https://github.com/REPA-E/REPA-E](https://github.com/REPA-E/REPA-E)
28
+
29
+
30
+ ## Model Description
31
+
32
+ We address the question of whether latent diffusion models and their VAE tokenizer can be trained end-to-end. REPA-E uses a representation-alignment (REPA) loss to enable stable and effective joint training of both components. This leads to significant training speedups (17x compared to REPA and 45x over vanilla training). End-to-end tuning also improves the VAE itself, resulting in a better latent structure and a drop-in replacement for existing VAEs (e.g., SD-VAE). Our method achieves state-of-the-art FID scores on ImageNet 256×256: 1.26 with CFG and 1.83 without CFG.
33
+
34
+ ## Usage
35
+
36
+ See the GitHub repository for detailed instructions on environment setup, training, and evaluation. Pre-trained checkpoints are available on Hugging Face.
37
+
38
+ ## Example Results
39
+
40
+ ![](assets/vis-examples.jpg)
41
+
42
+ ## Limitations and Bias
43
+
44
+ As with all diffusion models, REPA-E may exhibit biases present in the training data.
45
+
46
+ ## Citation
47
+
48
+ ```bibtex
49
+ @article{leng2025repae,
50
+ title={REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers},
51
+ author={Xingjian Leng and Jaskirat Singh and Yunzhong Hou and Zhenchang Xing and Saining Xie and Liang Zheng},
52
+ year={2025},
53
+ journal={arXiv preprint arXiv:2504.10483},
54
+ }
55
+ ```