This model has the smallest loss among many fp8 models, but no one uses it?
model+VAE+CILP = :)
Yes, this is the best all-in-one flux model.
gguf stuff are trash.
wtf I wished somebody told me this lmfao
90% are using GPU with vram less than 24GB, they just can't use this all-in-one model.
my gpu 3080ti
90% are using GPU with vram less than 24GB, they just can't use this all-in-one model.
my gpu 3080ti
SDXL is your best friend.
Are u ok?
to clear some things up. flux dev, schnell, fill, depth, canny or any finetunes all use the exact same clip and vae. thats why having them seperate is better so you only have them once not 5 times. fp8 imo is better than gguf. but if you cant run fp8? gguf q3 can cut flux fp8 in half. so this has its place in beginners, seperate is better for most, gguf is better for those who would otherwise be unable to use flux
yea, but don't most things offload in comfy anyway? Like I have 12GB VRAM and I can run all of the flux models in comfy without getting OOM but I've found that GGUF files are so much slower to run. Plus, with any caching speedups this fp8, even the fp16 for that matter, all run much faster than any GGUF model does..
what gpu do you have? i wouldnt be surprised
yea, but don't most things offload in comfy anyway? Like I have 12GB VRAM and I can run all of the flux models in comfy without getting OOM but I've found that GGUF files are so much slower to run. Plus, with any caching speedups this fp8, even the fp16 for that matter, all run much faster than any GGUF model does..
comfyui offloads and will even load fp16 flux in fp8 to prioritize speed where running half of f16 flux (24 gb) in ram would take forever. gguf is for those who have even less vram or system ram then you to where flux fp8 absolutely cannot be loaded. im curious what gpu you have because if older that might explain why gguf is slower. for me on my 4070 its the same as fp8 even tiny ggufs still are the same speed. what ggufs have you tried? q8 shouldnt be faster than fp8 because they are roughly the same file size with q8 being slightly bigger.
Ahh, so you're saying when I run flux fp16 in comfy it runs in fp8 automatically since I run out of VRAM? I have a 12GB 3060 and it seems like no matter whether I run a q4 or a q8 gguf, they always seem to be the same speed and much slower than the fp16 (running in fp8 I guess).
yeah its why you might as well just download fp8 flux cuz comfy is gonna run it like that anyways so save some file space. if it didnt it would be bottle necked by your ram speed from offloading and take ages for a single image. fp8 is roughly 12 gb instead of 24 so half the file size. wait i just realized, your running fp8 flux and q4 doesnt give you no boost. bro i remember when i started aitreprenaur (youtuber) said fp8 flux is for 24 gbs vram. i use it with 16 gb. you use it with 12. idk anymore lol. how long does it take for you to generate a 1024 by 1024 image?
Hey, just wanted to clarity that native core ComfyUI nodes will NOT cast to fp8 unless they are explicitly told to do so. Offloading only involves moving between the offload device ('cpu') and the load device ('cuda:0' on nvidia cards); if a tensor is in fp16, it will be kept as fp16. Offloading works by putting tensors that would not fit in VRAM onto the offload device until the very moment they are needed. This results in slower performance but avoids OOM. There are some bugs to be ironed out with this system, but that is how it works in a nutshell and will not cast to fp8 as part of that process.
thanks for clearing that up Kosinkadink. ran to comfyui with flux straight from black forest labs (so not fp8) with 16 gb vram and yep it did not load it in fp8. not sure where i got that from and sorry rosgrocar for misleading you. learned my lesson. fact check myself before i start preaching. or just leave it to the professionals lol