Diffusers loads the transformer twice, causing excessive memory usage

#35
by Fredtt3 - opened

I was running tests with this model and, in theory, it should be able to run without major issues on an NVIDIA H200, with the full model loaded (text encoder and transformer), since the expected memory usage should be in the 110–115 GB VRAM range. However, when attempting to run it on an H200, I encountered an out-of-memory (OOM) error.

I then tested it on an NVIDIA B200 and observed memory usage reaching approximately 178 GB. After inspecting the file structure, I found that diffusers was loading the flux2-dev.safetensors file located in the root directory together with the files in the transformer/ folder, even though they contain the same model. As a result, the transformer was being loaded twice, duplicating memory usage.

In addition, I noticed that inference times were unstable, ranging from around 17 seconds per generation to more than 2 minutes.

After removing the flux2-dev.safetensors file and the ae.safetensors file from the root directory, the issue disappeared and memory usage returned to the expected theoretical range of 110–115 GB.

I hope this issue can be fixed. As a temporary workaround, I repackaged the model in this Hugging Face repository: Aquiles-ai/FLUX.2-dev, for users who prefer not to deal with manually removing files and additional configuration.

Can you open this on the diffusers repository? Please also provide a reproducible code snippet so that we can confirm this is an issue.

I then tested it on an NVIDIA B200 and observed memory usage reaching approximately 178 GB. After inspecting the file structure, I found that diffusers was loading the flux2-dev.safetensors file located in the root directory together with the files in the transformer/ folder, even though they contain the same model. As a result, the transformer was being loaded twice, duplicating memory usage.

Please also point to the code in diffusers that made you think this is the case.

Sign up or log in to comment