Improve model card with metadata, links, and structure

This PR improves the model card by adding essential metadata, including the `pipeline_tag`, `library_name`, and confirming the license. It also structures the content for better readability and adds links to the paper and project page. This enhances the model card's completeness and user experience.

Files changed (1) hide show

README.md +55 -3

README.md CHANGED Viewed

@@ -1,3 +1,55 @@
----
-license: mit
----

+---
+license: mit
+pipeline_tag: image-to-image
+library_name: diffusers
+---
+# REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
+<p align="center">
+  <a href="https://scholar.google.com.au/citations?user=GQzvqS4AAAAJ" target="_blank">Xingjian&nbsp;Leng</a><sup>1*</sup> &ensp; <b>&middot;</b> &ensp;
+  <a href="https://1jsingh.github.io/" target="_blank">Jaskirat&nbsp;Singh</a><sup>1*</sup> &ensp; <b>&middot;</b> &ensp;
+  <a href="https://hou-yz.github.io/" target="_blank">Yunzhong&nbsp;Hou</a><sup>1</sup> &ensp; <b>&middot;</b> &ensp;
+  <a href="https://people.csiro.au/X/Z/Zhenchang-Xing/" target="_blank">Zhenchang&nbsp;Xing</a><sup>2</sup>&ensp; <b>&middot;</b> &ensp;
+  <a href="https://www.sainingxie.com/" target="_blank">Saining&nbsp;Xie</a><sup>3</sup>&ensp; <b>&middot;</b> &ensp;
+  <a href="https://zheng-lab-anu.github.io/" target="_blank">Liang&nbsp;Zheng</a><sup>1</sup>&ensp;
+</p>
+<p align="center">
+  <sup>1</sup> Australian National University &emsp; <sup>2</sup>Data61-CSIRO &emsp; <sup>3</sup>New York University &emsp; <br>
+  <sub><sup>*</sup>Project Leads &emsp;</sub>
+</p>
+[REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers](https://arxiv.org/abs/2504.10483)
+Project Page: [https://end2end-diffusion.github.io](https://end2end-diffusion.github.io)
+Code: [https://github.com/REPA-E/REPA-E](https://github.com/REPA-E/REPA-E)
+## Model Description
+We address the question of whether latent diffusion models and their VAE tokenizer can be trained end-to-end.  REPA-E uses a representation-alignment (REPA) loss to enable stable and effective joint training of both components.  This leads to significant training speedups (17x compared to REPA and 45x over vanilla training). End-to-end tuning also improves the VAE itself, resulting in a better latent structure and a drop-in replacement for existing VAEs (e.g., SD-VAE).  Our method achieves state-of-the-art FID scores on ImageNet 256×256: 1.26 with CFG and 1.83 without CFG.
+## Usage
+See the GitHub repository for detailed instructions on environment setup, training, and evaluation.  Pre-trained checkpoints are available on Hugging Face.
+## Example Results
+![](assets/vis-examples.jpg)
+## Limitations and Bias
+As with all diffusion models, REPA-E may exhibit biases present in the training data.
+## Citation
+```bibtex
+@article{leng2025repae,
+  title={REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers},
+  author={Xingjian Leng and Jaskirat Singh and Yunzhong Hou and Zhenchang Xing and Saining Xie and Liang Zheng},
+  year={2025},
+  journal={arXiv preprint arXiv:2504.10483},
+}
+```