Image-to-Image
Diffusers
nielsr HF Staff commited on
Commit
4a5b8a0
·
verified ·
1 Parent(s): 24857c0

Improve model card with metadata, links, and structure

Browse files

This PR improves the model card by adding essential metadata, including the `pipeline_tag`, `library_name`, and confirming the license. It also structures the content for better readability and adds links to the paper and project page. This enhances the model card's completeness and user experience.

Files changed (1) hide show
  1. README.md +55 -3
README.md CHANGED
@@ -1,3 +1,55 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: image-to-image
4
+ library_name: diffusers
5
+ ---
6
+
7
+ # REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
8
+
9
+ <p align="center">
10
+ <a href="https://scholar.google.com.au/citations?user=GQzvqS4AAAAJ" target="_blank">Xingjian&nbsp;Leng</a><sup>1*</sup> &ensp; <b>&middot;</b> &ensp;
11
+ <a href="https://1jsingh.github.io/" target="_blank">Jaskirat&nbsp;Singh</a><sup>1*</sup> &ensp; <b>&middot;</b> &ensp;
12
+ <a href="https://hou-yz.github.io/" target="_blank">Yunzhong&nbsp;Hou</a><sup>1</sup> &ensp; <b>&middot;</b> &ensp;
13
+ <a href="https://people.csiro.au/X/Z/Zhenchang-Xing/" target="_blank">Zhenchang&nbsp;Xing</a><sup>2</sup>&ensp; <b>&middot;</b> &ensp;
14
+ <a href="https://www.sainingxie.com/" target="_blank">Saining&nbsp;Xie</a><sup>3</sup>&ensp; <b>&middot;</b> &ensp;
15
+ <a href="https://zheng-lab-anu.github.io/" target="_blank">Liang&nbsp;Zheng</a><sup>1</sup>&ensp;
16
+ </p>
17
+
18
+ <p align="center">
19
+ <sup>1</sup> Australian National University &emsp; <sup>2</sup>Data61-CSIRO &emsp; <sup>3</sup>New York University &emsp; <br>
20
+ <sub><sup>*</sup>Project Leads &emsp;</sub>
21
+ </p>
22
+
23
+ [REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers](https://arxiv.org/abs/2504.10483)
24
+
25
+ Project Page: [https://end2end-diffusion.github.io](https://end2end-diffusion.github.io)
26
+
27
+ Code: [https://github.com/REPA-E/REPA-E](https://github.com/REPA-E/REPA-E)
28
+
29
+
30
+ ## Model Description
31
+
32
+ We address the question of whether latent diffusion models and their VAE tokenizer can be trained end-to-end. REPA-E uses a representation-alignment (REPA) loss to enable stable and effective joint training of both components. This leads to significant training speedups (17x compared to REPA and 45x over vanilla training). End-to-end tuning also improves the VAE itself, resulting in a better latent structure and a drop-in replacement for existing VAEs (e.g., SD-VAE). Our method achieves state-of-the-art FID scores on ImageNet 256×256: 1.26 with CFG and 1.83 without CFG.
33
+
34
+ ## Usage
35
+
36
+ See the GitHub repository for detailed instructions on environment setup, training, and evaluation. Pre-trained checkpoints are available on Hugging Face.
37
+
38
+ ## Example Results
39
+
40
+ ![](assets/vis-examples.jpg)
41
+
42
+ ## Limitations and Bias
43
+
44
+ As with all diffusion models, REPA-E may exhibit biases present in the training data.
45
+
46
+ ## Citation
47
+
48
+ ```bibtex
49
+ @article{leng2025repae,
50
+ title={REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers},
51
+ author={Xingjian Leng and Jaskirat Singh and Yunzhong Hou and Zhenchang Xing and Saining Xie and Liang Zheng},
52
+ year={2025},
53
+ journal={arXiv preprint arXiv:2504.10483},
54
+ }
55
+ ```