Diffusers
Safetensors
English
KBlueLeaf commited on
Commit
f8721d8
·
verified ·
1 Parent(s): 8bedb98

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -12
README.md CHANGED
@@ -9,10 +9,9 @@ base_model:
9
  - stabilityai/sdxl-vae
10
  library_name: diffusers
11
  ---
12
-
13
  # EQ-SDXL-VAE: open sourced reproduction of EQ-VAE on SDXL-VAE
14
 
15
- **Training is still in progress, there may have more updates in the future**
16
 
17
  original paper: https://arxiv.org/abs/2502.09509 <br>
18
  source code of the reproduction: https://github.com/KohakuBlueleaf/HakuLatent
@@ -74,21 +73,31 @@ vae = AutoencoderKL.from_pretrained("KBlueLeaf/EQ-SDXL-VAE").cuda().half()
74
  * recon loss: 1.0
75
  * adv(disc) loss: 0.5
76
  * kl div loss: 1e-7
 
 
 
 
 
 
 
 
77
 
78
  ## Evaluation Results
79
 
80
- We use the validation split of imagenet in 256x256 resolution and use MSE loss, PSNR, LPIPS and ConvNeXt perceptual loss as our metric.
 
 
 
 
 
 
 
81
 
82
- | Metrics | SDXL-VAE | EQ-SDXL-VAE |
83
- | -------- | --------- | ----------- |
84
- | MSE Loss | 3.681e-3 | 3.720e-3 |
85
- | PSNR | 24.6602 | 24.5649 |
86
- | LPIPS | 0.1314 | 0.1407 |
87
- | ConvNeXt | 1.303e-03 | 1.546e-03 |
88
 
89
- Based on the result of original paper, the degradation of performance is somehow expected. Since EQ should be seen as kind of reguarlization, which require model to have more capacity to achieve same performance while maintain some specific property.
90
 
91
- If we count the "quality of Latent" into account, the EQ-SDXL-VAE definitely overtake the original SDXL-VAE
92
 
93
  ## Next step
94
 
@@ -120,4 +129,7 @@ Also, I will try to train a simple approximation decoder which have only 2x upsc
120
  ```
121
 
122
  ## Acknowledgement
123
- * [xiaoqianWX](https://huggingface.co/xiaoqianWX): Provide the compute resource.
 
 
 
 
9
  - stabilityai/sdxl-vae
10
  library_name: diffusers
11
  ---
 
12
  # EQ-SDXL-VAE: open sourced reproduction of EQ-VAE on SDXL-VAE
13
 
14
+ **Adv-FT is done and achieve better performance than original SDXL-VAE!!!**
15
 
16
  original paper: https://arxiv.org/abs/2502.09509 <br>
17
  source code of the reproduction: https://github.com/KohakuBlueleaf/HakuLatent
 
73
  * recon loss: 1.0
74
  * adv(disc) loss: 0.5
75
  * kl div loss: 1e-7
76
+ * For Adv FT
77
+ * recon loss: 1.0
78
+ * MSE Loss: 1.5
79
+ * LPIPS Loss: 0.5
80
+ * ConvNeXt perceptual Loss: 2.0
81
+ * adv loss: 1.0
82
+ * kl div loss: 0.0
83
+ * Encoder freezed
84
 
85
  ## Evaluation Results
86
 
87
+ We use the validation split and test split (totally 150k images) of imagenet in 256x256 resolution and use MSE loss, PSNR, LPIPS and ConvNeXt perceptual loss as our metric.
88
+
89
+ | Metrics | SDXL-VAE | EQ-SDXL-VAE | EQ-SDXL-VAE Adv FT |
90
+ | -------- | --------- | ----------- | ------------------ |
91
+ | MSE Loss | 3.683e-3 | 3.723e-3 | 3.532e-03 |
92
+ | PSNR | 24.4698 | 24.4030 | 24.6364 |
93
+ | LPIPS | 0.1316 | 0.1409 | 0.1299 |
94
+ | ConvNeXt | 1.305e-03 | 1.548e-03 | 1.322e-03 |
95
 
96
+ We can see after the EQ-VAE training without adv loss, the EQ-SDXL-VAE is slightly worse than original VAE.
 
 
 
 
 
97
 
98
+ While After finetuning with Adversarial Loss enabled with Encoder freezed, the PSNR and LPIPS even improved to be better than original VAE!
99
 
100
+ **Note**: This repo contains the weight of EQ-SDXL-VAE Adv FT.
101
 
102
  ## Next step
103
 
 
129
  ```
130
 
131
  ## Acknowledgement
132
+
133
+ * [xiaoqianWX](https://huggingface.co/xiaoqianWX): Provide the compute resource.
134
+
135
+ * [AmericanPresidentJimmyCarter ](AmericanPresidentJimmyCarter ): Provide implementation of Random Affine transformation.