Difference between hugging face and GitHub weights
Even though both set of weights seem to have been updated on June 2024, there seems to be a big difference in the weights. I noticed this when working on an inpainting algorithm. For that to work you need to merge with the original so you don't degrade the image except where you are inpainting. I noticed an issue that the inpainting wasn't matching the brightness of the original, so I tried both versions of the weights. They are obviously different because the inpainting algorithm is not compatible between the two. But this is just a fine tuning issue once I pick which set of weights to use. Anyway, both have a change in brightness, but the hugging face weights are much worse. In the images below the top row is the original with a masked area to inpaint. The middle image is with the mask area replaced by a round trip of the vae.decoder(van.encoder(image)), the bottom image is just the roundtrip. ideally you shouldn't be able to detect the inpaint areas in the middle image. Note, the inputs are correctly scaled to [0,1]
I was able to replicate your results locally and found my mistake in the code. I was also using open clip, which requires [0,1] input range and accidentally used that range for the hugging face version of the taesd. When I changed the input to [-1,1] the results matched.
Sorry about the inconvenience. It can sometimes be difficult to keep the input ranges straight when different libraries expect different ranges.
Ah, that makes sense! The different scaling factors (input ranges as well as the VAE latent scales) are definitely annoying to deal with (see e.g. https://github.com/NVlabs/edm2/issues/9 😅). Glad it's working now.