KBlueLeaf
/

EQ-SDXL-VAE

Model card Files Files and versions

KBlueLeaf commited on Feb 26

Commit

f8721d8

·

verified ·

1 Parent(s): 8bedb98

Update README.md

Files changed (1) hide show

README.md +24 -12

README.md CHANGED Viewed

@@ -9,10 +9,9 @@ base_model:
 - stabilityai/sdxl-vae
 library_name: diffusers
 ---
 # EQ-SDXL-VAE: open sourced reproduction of EQ-VAE on SDXL-VAE
-**Training is still in progress, there may have more updates in the future**
 original paper: https://arxiv.org/abs/2502.09509 <br>
 source code of the reproduction: https://github.com/KohakuBlueleaf/HakuLatent
@@ -74,21 +73,31 @@ vae = AutoencoderKL.from_pretrained("KBlueLeaf/EQ-SDXL-VAE").cuda().half()
   * recon loss: 1.0
   * adv(disc) loss: 0.5
   * kl div loss: 1e-7
 ## Evaluation Results
-We use the validation split of imagenet in 256x256 resolution and use MSE loss, PSNR, LPIPS and ConvNeXt perceptual loss as our metric.
-| Metrics  | SDXL-VAE  | EQ-SDXL-VAE |
-| -------- | --------- | ----------- |
-| MSE Loss | 3.681e-3  | 3.720e-3    |
-| PSNR     | 24.6602   | 24.5649     |
-| LPIPS    | 0.1314    | 0.1407      |
-| ConvNeXt | 1.303e-03 | 1.546e-03   |
-Based on the result of original paper, the degradation of performance is somehow expected. Since EQ should be seen as kind of reguarlization, which require model to have more capacity to achieve same performance while maintain some specific property.
-If we count the "quality of Latent" into account, the EQ-SDXL-VAE definitely overtake the original SDXL-VAE
 ## Next step
@@ -120,4 +129,7 @@ Also, I will try to train a simple approximation decoder which have only 2x upsc
 ```
 ## Acknowledgement
-* [xiaoqianWX](https://huggingface.co/xiaoqianWX): Provide the compute resource.

 - stabilityai/sdxl-vae
 library_name: diffusers
 ---
 # EQ-SDXL-VAE: open sourced reproduction of EQ-VAE on SDXL-VAE
+**Adv-FT is done and achieve better performance than original SDXL-VAE!!!**
 original paper: https://arxiv.org/abs/2502.09509 <br>
 source code of the reproduction: https://github.com/KohakuBlueleaf/HakuLatent
   * recon loss: 1.0
   * adv(disc) loss: 0.5
   * kl div loss: 1e-7
+* For Adv FT
+  * recon loss: 1.0
+    * MSE Loss: 1.5
+    * LPIPS Loss: 0.5
+    * ConvNeXt perceptual Loss: 2.0
+  * adv loss: 1.0
+  * kl div loss: 0.0
+    * Encoder freezed
 ## Evaluation Results
+We use the validation split and test split (totally 150k images) of imagenet in 256x256 resolution and use MSE loss, PSNR, LPIPS and ConvNeXt perceptual loss as our metric.
+| Metrics  | SDXL-VAE  | EQ-SDXL-VAE | EQ-SDXL-VAE Adv FT |
+| -------- | --------- | ----------- | ------------------ |
+| MSE Loss | 3.683e-3  | 3.723e-3    | 3.532e-03          |
+| PSNR     | 24.4698   | 24.4030     | 24.6364            |
+| LPIPS    | 0.1316    | 0.1409      | 0.1299             |
+| ConvNeXt | 1.305e-03 | 1.548e-03   | 1.322e-03          |
+We can see after the EQ-VAE training without adv loss, the EQ-SDXL-VAE is slightly worse than original VAE.
+While After finetuning with Adversarial Loss enabled with Encoder freezed, the PSNR and LPIPS even improved to be better than original VAE!
+**Note**: This repo contains the weight of EQ-SDXL-VAE Adv FT.
 ## Next step
 ```
 ## Acknowledgement
+* [xiaoqianWX](https://huggingface.co/xiaoqianWX): Provide the compute resource.
+* [AmericanPresidentJimmyCarter ](AmericanPresidentJimmyCarter ): Provide implementation of Random Affine transformation.