REPA-E
/

sit-repae-invae

Image-to-Image

Diffusers

Model card Files Files and versions

xet

Community

xingjianleng commited on Apr 22

Commit

f55b1dc

verified ·

1 Parent(s): 4747927

Update README.md

Browse files

Files changed (1) hide show

README.md +45 -24

README.md CHANGED Viewed

@@ -4,36 +4,57 @@ library_name: diffusers
 pipeline_tag: image-to-image
 ---
----
-license: mit
-library_name: diffusers
-pipeline_tag: image-to-image
----
-# REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
-This model implements the REPA-E approach for end-to-end tuning of latent diffusion transformers, as described in the paper [REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers](https://huggingface.co/papers/2504.10483). REPA-E enables stable and effective joint training of both the VAE and the diffusion model, leading to faster training and improved generation quality.
-For more information, please refer to the following resources:
-*   **Project Page:** https://end2end-diffusion.github.io
-*   **GitHub Repository:** https://github.com/REPA-E/REPA-E
-## Usage
-You can use this model with the `diffusers` library. Here's a basic example:
-```python
-from diffusers import DiffusionPipeline
-# Load the pipeline
-pipeline = DiffusionPipeline.from_pretrained("REPA-E/your-model-name") # Replace "REPA-E/your-model-name"
-# Generate an image
-image = pipeline().images[0]
-# Save the image
-image.save("generated_image.png")
 ```
-Please refer to the GitHub repository for detailed instructions and more advanced usage examples.

 pipeline_tag: image-to-image
 ---
+<h1 align="center"> REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers </h1>
+<p align="center">
+  <a href="https://scholar.google.com.au/citations?user=GQzvqS4AAAAJ" target="_blank">Xingjian&nbsp;Leng</a><sup>1*</sup> &ensp; <b>&middot;</b> &ensp;
+  <a href="https://1jsingh.github.io/" target="_blank">Jaskirat&nbsp;Singh</a><sup>1*</sup> &ensp; <b>&middot;</b> &ensp;
+  <a href="https://hou-yz.github.io/" target="_blank">Yunzhong&nbsp;Hou</a><sup>1</sup> &ensp; <b>&middot;</b> &ensp;
+  <a href="https://people.csiro.au/X/Z/Zhenchang-Xing/" target="_blank">Zhenchang&nbsp;Xing</a><sup>2</sup>&ensp; <b>&middot;</b> &ensp;
+  <a href="https://www.sainingxie.com/" target="_blank">Saining&nbsp;Xie</a><sup>3</sup>&ensp; <b>&middot;</b> &ensp;
+  <a href="https://zheng-lab-anu.github.io/" target="_blank">Liang&nbsp;Zheng</a><sup>1</sup>&ensp;
+</p>
+<p align="center">
+  <sup>1</sup> Australian National University &emsp; <sup>2</sup>Data61-CSIRO &emsp; <sup>3</sup>New York University &emsp; <br>
+  <sub><sup>*</sup>Project Leads&emsp;</sub>
+</p>
+<p align="center">
+  <a href="https://End2End-Diffusion.github.io">🌐 Project Page</a> &ensp;
+  <a href="https://huggingface.co/REPA-E">🤗 Models</a> &ensp;
+  <a href="https://arxiv.org/abs/2504.10483">📃 Paper</a> &ensp;
+  <br>
+  <a href="https://paperswithcode.com/sota/image-generation-on-imagenet-256x256?p=repa-e-unlocking-vae-for-end-to-end-tuning-of"><img src="https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/repa-e-unlocking-vae-for-end-to-end-tuning-of/image-generation-on-imagenet-256x256" alt="PWC"></a>
+</p>
+<p align="center">
+  <img src="https://github.com/End2End-Diffusion/REPA-E/raw/main/assets/vis-examples.jpg" width="100%" alt="teaser">
+</p>
+---
+We address a fundamental question: ***Can latent diffusion models and their VAE tokenizer be trained end-to-end?*** While training both components jointly with standard diffusion loss is observed to be ineffective — often degrading final performance — we show that this limitation can be overcome using a simple representation-alignment (REPA) loss. Our proposed method, **REPA-E**, enables stable and effective joint training of both the VAE and the diffusion model.
+<p align="center">
+  <img src="https://github.com/End2End-Diffusion/REPA-E/raw/main/assets/overview.jpg" width="100%" alt="teaser">
+</p>
+**REPA-E** significantly accelerates training — achieving over **17×** speedup compared to REPA and **45×** over the vanilla training recipe. Interestingly, end-to-end tuning also improves the VAE itself: the resulting **E2E-VAE** provides better latent structure and serves as a **drop-in replacement** for existing VAEs (e.g., SD-VAE), improving convergence and generation quality across diverse LDM architectures. Our method achieves state-of-the-art FID scores on ImageNet 256×256: **1.26** with CFG and **1.83** without CFG.
+## Usage and Training
+Please refer our [Github Repo](https://github.com/End2End-Diffusion/REPA-E) for detailed notes on end-to-end training and inference using REPA-E.
+## 📚 Citation
+```bibtex
+@article{leng2025repae,
+  title={REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers},
+  author={Xingjian Leng and Jaskirat Singh and Yunzhong Hou and Zhenchang Xing and Saining Xie and Liang Zheng},
+  year={2025},
+  journal={arXiv preprint arXiv:2504.10483},
+}
 ```