Add pipeline tag and library name
Browse filesThis PR adds the `pipeline_tag` and `library_name` to the model card metadata. The `pipeline_tag` is set to `image-to-image` as the model generates images from images. The `library_name` is set to `diffusers` based on the training scripts and code examples provided.
README.md
CHANGED
@@ -1,3 +1,42 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
pipeline_tag: image-to-image
|
4 |
+
library_name: diffusers
|
5 |
+
---
|
6 |
+
|
7 |
+
<h1 align="center"> REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers </h1>
|
8 |
+
|
9 |
+
<p align="center">
|
10 |
+
<a href="https://scholar.google.com.au/citations?user=GQzvqS4AAAAJ" target="_blank">Xingjian Leng</a><sup>1*</sup>   <b>·</b>  
|
11 |
+
<a href="https://1jsingh.github.io/" target="_blank">Jaskirat Singh</a><sup>1*</sup>   <b>·</b>  
|
12 |
+
<a href="https://hou-yz.github.io/" target="_blank">Yunzhong Hou</a><sup>1</sup>   <b>·</b>  
|
13 |
+
<a href="https://people.csiro.au/X/Z/Zhenchang-Xing/" target="_blank">Zhenchang Xing</a><sup>2</sup>  <b>·</b>  
|
14 |
+
<a href="https://www.sainingxie.com/" target="_blank">Saining Xie</a><sup>3</sup>  <b>·</b>  
|
15 |
+
<a href="https://zheng-lab-anu.github.io/" target="_blank">Liang Zheng</a><sup>1</sup> 
|
16 |
+
</p>
|
17 |
+
|
18 |
+
<p align="center">
|
19 |
+
<sup>1</sup> Australian National University   <sup>2</sup>Data61-CSIRO   <sup>3</sup>New York University   <br>
|
20 |
+
<sub><sup>*</sup>Project Leads  </sub>
|
21 |
+
</p>
|
22 |
+
|
23 |
+
<p align="center">
|
24 |
+
<a href="https://End2End-Diffusion.github.io">π Project Page</a>  
|
25 |
+
<a href="https://huggingface.co/REPA-E">π€ Models</a>  
|
26 |
+
<a href="https://arxiv.org/abs/2504.10483">π Paper</a>  
|
27 |
+
<br><br>
|
28 |
+
<a href="https://paperswithcode.com/sota/image-generation-on-imagenet-256x256?p=repa-e-unlocking-vae-for-end-to-end-tuning-of"><img src="https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/repa-e-unlocking-vae-for-end-to-end-tuning-of/image-generation-on-imagenet-256x256" alt="PWC"></a>
|
29 |
+
</p>
|
30 |
+
|
31 |
+

|
32 |
+
|
33 |
+
## Overview
|
34 |
+
We address a fundamental question: ***Can latent diffusion models and their VAE tokenizer be trained end-to-end?*** While training both components jointly with standard diffusion loss is observed to be ineffective β often degrading final performance β we show that this limitation can be overcome using a simple representation-alignment (REPA) loss. Our proposed method, **REPA-E**, enables stable and effective joint training of both the VAE and the diffusion model.
|
35 |
+
|
36 |
+

|
37 |
+
|
38 |
+
**REPA-E** significantly accelerates training β achieving over **17Γ** speedup compared to REPA and **45Γ** over the vanilla training recipe. Interestingly, end-to-end tuning also improves the VAE itself: the resulting **E2E-VAE** provides better latent structure and serves as a **drop-in replacement** for existing VAEs (e.g., SD-VAE), improving convergence and generation quality across diverse LDM architectures. Our method achieves state-of-the-art FID scores on ImageNet 256Γ256: **1.26** with CFG and **1.83** without CFG.
|
39 |
+
|
40 |
+
## News and Updates
|
41 |
+
**[2025-04-15]** Initial Release with pre-trained models and codebase.
|
42 |
+
... (rest of the content remains unchanged)
|