Difference between hugging face and GitHub weights

by NeilW - opened 19 days ago

19 days ago

Even though both set of weights seem to have been updated on June 2024, there seems to be a big difference in the weights. I noticed this when working on an inpainting algorithm. For that to work you need to merge with the original so you don't degrade the image except where you are inpainting. I noticed an issue that the inpainting wasn't matching the brightness of the original, so I tried both versions of the weights. They are obviously different because the inpainting algorithm is not compatible between the two. But this is just a fine tuning issue once I pick which set of weights to use. Anyway, both have a change in brightness, but the hugging face weights are much worse. In the images below the top row is the original with a masked area to inpaint. The middle image is with the mask area replaced by a round trip of the vae.decoder(van.encoder(image)), the bottom image is just the roundtrip. ideally you shouldn't be able to detect the inpaint areas in the middle image. Note, the inputs are correctly scaled to [0,1]

madebyollin

Owner 19 days ago

Hmm, the weights themselves seem identical (notebook).

Also, the modules seem to produce identical roundtrip output (notebook).

So, I'm not sure what's causing the issue in your screenshot. If you post an example notebook that reproduces the issue I can investigate further.

NeilW

17 days ago

I was able to replicate your results locally and found my mistake in the code. I was also using open clip, which requires [0,1] input range and accidentally used that range for the hugging face version of the taesd. When I changed the input to [-1,1] the results matched.

Sorry about the inconvenience. It can sometimes be difficult to keep the input ranges straight when different libraries expect different ranges.

NeilW changed discussion status to closed 17 days ago

madebyollin

Owner 17 days ago

Ah, that makes sense! The different scaling factors (input ranges as well as the VAE latent scales) are definitely annoying to deal with (see e.g. https://github.com/NVlabs/edm2/issues/9 😅). Glad it's working now.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment