city96/FLUX.1-dev-gguf · Create an equivalent to GGUF for Diffusers models?

Yeah, good call. I don't think we need a separate file format, considering GGUF is just a storage format at it's core but standardizing the metadata/state dict format would be good.

For example, these ones only really have the bare minimum info added (see image below). We could include the relevant diffusers model class + maybe the pipeline tag in the metadata. Some of the LLM specific metadata from llama.cpp can be used, i.e. license/author/model name/etc. I guess there's no chat template/etc but including metadata that identifies things like LCM/Turbo/CFG distillation would be useful for inference.

One thing that would be nice is standardizing the state dict format and forcing a unified format the way llama.cpp does for LLMs (flux has their original reference format, as well as the diffusers format. Same with SD1/SDXL checkpoints with the SAI/HF formats). The problem I see with that are edgecases like the q/k/v keys being fused/separate - which is not something you can realistically adjust on the fly with quantized models, so backends would have to have logic for using either. Also, any key that needs to be reshaped between the two formats would cause similar issues (like the pos_embed weight in SD3, which is extra hard to deal with because the first dim is 1 which the cpp code isn't really meant to deal with lol).

I guess there's also the issue of DiT based VS unet models. You can't really force the fixed blk naming like decoder-only LLMs for the state dict since there's two (three for unet) sets of blocks here. A lot of parts don't overlap much either, since there's different block types (do we force the resnets and transformer blocks to share the same naming, etc).