REPA-E
/

e2e-invae

Image-to-Image

Diffusers

Model card Files Files and versions Community

Improve model card with metadata, links, and structure

by nielsr HF Staff - opened Apr 15

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+55

-3

Files changed (1) hide show

README.md +55 -3

README.md CHANGED Viewed

@@ -1,3 +1,55 @@
----
-license: mit
----

+---
+license: mit
+pipeline_tag: image-to-image
+library_name: diffusers
+---
+# REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
+<p align="center">
+  <a href="https://scholar.google.com.au/citations?user=GQzvqS4AAAAJ" target="_blank">Xingjian&nbsp;Leng</a><sup>1*</sup> &ensp; <b>&middot;</b> &ensp;
+  <a href="https://1jsingh.github.io/" target="_blank">Jaskirat&nbsp;Singh</a><sup>1*</sup> &ensp; <b>&middot;</b> &ensp;
+  <a href="https://hou-yz.github.io/" target="_blank">Yunzhong&nbsp;Hou</a><sup>1</sup> &ensp; <b>&middot;</b> &ensp;
+  <a href="https://people.csiro.au/X/Z/Zhenchang-Xing/" target="_blank">Zhenchang&nbsp;Xing</a><sup>2</sup>&ensp; <b>&middot;</b> &ensp;
+  <a href="https://www.sainingxie.com/" target="_blank">Saining&nbsp;Xie</a><sup>3</sup>&ensp; <b>&middot;</b> &ensp;
+  <a href="https://zheng-lab-anu.github.io/" target="_blank">Liang&nbsp;Zheng</a><sup>1</sup>&ensp;
+</p>
+<p align="center">
+  <sup>1</sup> Australian National University &emsp; <sup>2</sup>Data61-CSIRO &emsp; <sup>3</sup>New York University &emsp; <br>
+  <sub><sup>*</sup>Project Leads &emsp;</sub>
+</p>
+[REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers](https://arxiv.org/abs/2504.10483)
+Project Page: [https://end2end-diffusion.github.io](https://end2end-diffusion.github.io)
+Code: [https://github.com/REPA-E/REPA-E](https://github.com/REPA-E/REPA-E)
+## Model Description
+We address the question of whether latent diffusion models and their VAE tokenizer can be trained end-to-end.  REPA-E uses a representation-alignment (REPA) loss to enable stable and effective joint training of both components.  This leads to significant training speedups (17x compared to REPA and 45x over vanilla training). End-to-end tuning also improves the VAE itself, resulting in a better latent structure and a drop-in replacement for existing VAEs (e.g., SD-VAE).  Our method achieves state-of-the-art FID scores on ImageNet 256×256: 1.26 with CFG and 1.83 without CFG.
+## Usage
+See the GitHub repository for detailed instructions on environment setup, training, and evaluation.  Pre-trained checkpoints are available on Hugging Face.
+## Example Results
+![](assets/vis-examples.jpg)
+## Limitations and Bias
+As with all diffusion models, REPA-E may exhibit biases present in the training data.
+## Citation
+```bibtex
+@article{leng2025repae,
+  title={REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers},
+  author={Xingjian Leng and Jaskirat Singh and Yunzhong Hou and Zhenchang Xing and Saining Xie and Liang Zheng},
+  year={2025},
+  journal={arXiv preprint arXiv:2504.10483},
+}
+```