Update README.md
Browse files
README.md
CHANGED
@@ -9,10 +9,9 @@ base_model:
|
|
9 |
- stabilityai/sdxl-vae
|
10 |
library_name: diffusers
|
11 |
---
|
12 |
-
|
13 |
# EQ-SDXL-VAE: open sourced reproduction of EQ-VAE on SDXL-VAE
|
14 |
|
15 |
-
**
|
16 |
|
17 |
original paper: https://arxiv.org/abs/2502.09509 <br>
|
18 |
source code of the reproduction: https://github.com/KohakuBlueleaf/HakuLatent
|
@@ -74,21 +73,31 @@ vae = AutoencoderKL.from_pretrained("KBlueLeaf/EQ-SDXL-VAE").cuda().half()
|
|
74 |
* recon loss: 1.0
|
75 |
* adv(disc) loss: 0.5
|
76 |
* kl div loss: 1e-7
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
|
78 |
## Evaluation Results
|
79 |
|
80 |
-
We use the validation split of imagenet in 256x256 resolution and use MSE loss, PSNR, LPIPS and ConvNeXt perceptual loss as our metric.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
81 |
|
82 |
-
|
83 |
-
| -------- | --------- | ----------- |
|
84 |
-
| MSE Loss | 3.681e-3 | 3.720e-3 |
|
85 |
-
| PSNR | 24.6602 | 24.5649 |
|
86 |
-
| LPIPS | 0.1314 | 0.1407 |
|
87 |
-
| ConvNeXt | 1.303e-03 | 1.546e-03 |
|
88 |
|
89 |
-
|
90 |
|
91 |
-
|
92 |
|
93 |
## Next step
|
94 |
|
@@ -120,4 +129,7 @@ Also, I will try to train a simple approximation decoder which have only 2x upsc
|
|
120 |
```
|
121 |
|
122 |
## Acknowledgement
|
123 |
-
|
|
|
|
|
|
|
|
9 |
- stabilityai/sdxl-vae
|
10 |
library_name: diffusers
|
11 |
---
|
|
|
12 |
# EQ-SDXL-VAE: open sourced reproduction of EQ-VAE on SDXL-VAE
|
13 |
|
14 |
+
**Adv-FT is done and achieve better performance than original SDXL-VAE!!!**
|
15 |
|
16 |
original paper: https://arxiv.org/abs/2502.09509 <br>
|
17 |
source code of the reproduction: https://github.com/KohakuBlueleaf/HakuLatent
|
|
|
73 |
* recon loss: 1.0
|
74 |
* adv(disc) loss: 0.5
|
75 |
* kl div loss: 1e-7
|
76 |
+
* For Adv FT
|
77 |
+
* recon loss: 1.0
|
78 |
+
* MSE Loss: 1.5
|
79 |
+
* LPIPS Loss: 0.5
|
80 |
+
* ConvNeXt perceptual Loss: 2.0
|
81 |
+
* adv loss: 1.0
|
82 |
+
* kl div loss: 0.0
|
83 |
+
* Encoder freezed
|
84 |
|
85 |
## Evaluation Results
|
86 |
|
87 |
+
We use the validation split and test split (totally 150k images) of imagenet in 256x256 resolution and use MSE loss, PSNR, LPIPS and ConvNeXt perceptual loss as our metric.
|
88 |
+
|
89 |
+
| Metrics | SDXL-VAE | EQ-SDXL-VAE | EQ-SDXL-VAE Adv FT |
|
90 |
+
| -------- | --------- | ----------- | ------------------ |
|
91 |
+
| MSE Loss | 3.683e-3 | 3.723e-3 | 3.532e-03 |
|
92 |
+
| PSNR | 24.4698 | 24.4030 | 24.6364 |
|
93 |
+
| LPIPS | 0.1316 | 0.1409 | 0.1299 |
|
94 |
+
| ConvNeXt | 1.305e-03 | 1.548e-03 | 1.322e-03 |
|
95 |
|
96 |
+
We can see after the EQ-VAE training without adv loss, the EQ-SDXL-VAE is slightly worse than original VAE.
|
|
|
|
|
|
|
|
|
|
|
97 |
|
98 |
+
While After finetuning with Adversarial Loss enabled with Encoder freezed, the PSNR and LPIPS even improved to be better than original VAE!
|
99 |
|
100 |
+
**Note**: This repo contains the weight of EQ-SDXL-VAE Adv FT.
|
101 |
|
102 |
## Next step
|
103 |
|
|
|
129 |
```
|
130 |
|
131 |
## Acknowledgement
|
132 |
+
|
133 |
+
* [xiaoqianWX](https://huggingface.co/xiaoqianWX): Provide the compute resource.
|
134 |
+
|
135 |
+
* [AmericanPresidentJimmyCarter ](AmericanPresidentJimmyCarter ): Provide implementation of Random Affine transformation.
|