This model has the smallest loss among many fp8 models, but no one uses it?

by divineblessing - opened Aug 21, 2024

Discussion

divineblessing

Aug 21, 2024

•

edited Aug 21, 2024

model+VAE+CILP = :)

deleted

Aug 30, 2024

Yes, this is the best all-in-one flux model.
gguf stuff are trash.

DJLegends

Sep 13, 2024

Yes, this is the best all-in-one flux model.
gguf stuff are trash.

wtf I wished somebody told me this lmfao

deleted

Sep 14, 2024

90% are using GPU with vram less than 24GB, they just can't use this all-in-one model.

divineblessing

Sep 14, 2024

90% are using GPU with vram less than 24GB, they just can't use this all-in-one model.

my gpu 3080ti

deleted

Sep 14, 2024

90% are using GPU with vram less than 24GB, they just can't use this all-in-one model.

my gpu 3080ti

SDXL is your best friend.

divineblessing

Sep 18, 2024

90% are using GPU with vram less than 24GB, they just can't use this all-in-one model.

my gpu 3080ti

SDXL is your best friend.

Are u ok?

Zeeeeta

Nov 12, 2024

This comment has been hidden

MaxedOut

Apr 12

to clear some things up. flux dev, schnell, fill, depth, canny or any finetunes all use the exact same clip and vae. thats why having them seperate is better so you only have them once not 5 times. fp8 imo is better than gguf. but if you cant run fp8? gguf q3 can cut flux fp8 in half. so this has its place in beginners, seperate is better for most, gguf is better for those who would otherwise be unable to use flux

rosgrocar

Apr 21

yea, but don't most things offload in comfy anyway? Like I have 12GB VRAM and I can run all of the flux models in comfy without getting OOM but I've found that GGUF files are so much slower to run. Plus, with any caching speedups this fp8, even the fp16 for that matter, all run much faster than any GGUF model does..

MaxedOut

Apr 24

what gpu do you have? i wouldnt be surprised

yea, but don't most things offload in comfy anyway? Like I have 12GB VRAM and I can run all of the flux models in comfy without getting OOM but I've found that GGUF files are so much slower to run. Plus, with any caching speedups this fp8, even the fp16 for that matter, all run much faster than any GGUF model does..

comfyui offloads and will even load fp16 flux in fp8 to prioritize speed where running half of f16 flux (24 gb) in ram would take forever. gguf is for those who have even less vram or system ram then you to where flux fp8 absolutely cannot be loaded. im curious what gpu you have because if older that might explain why gguf is slower. for me on my 4070 its the same as fp8 even tiny ggufs still are the same speed. what ggufs have you tried? q8 shouldnt be faster than fp8 because they are roughly the same file size with q8 being slightly bigger.

rosgrocar

Apr 24

Ahh, so you're saying when I run flux fp16 in comfy it runs in fp8 automatically since I run out of VRAM? I have a 12GB 3060 and it seems like no matter whether I run a q4 or a q8 gguf, they always seem to be the same speed and much slower than the fp16 (running in fp8 I guess).

MaxedOut

Apr 25

yeah its why you might as well just download fp8 flux cuz comfy is gonna run it like that anyways so save some file space. if it didnt it would be bottle necked by your ram speed from offloading and take ages for a single image. fp8 is roughly 12 gb instead of 24 so half the file size. wait i just realized, your running fp8 flux and q4 doesnt give you no boost. bro i remember when i started aitreprenaur (youtuber) said fp8 flux is for 24 gbs vram. i use it with 16 gb. you use it with 12. idk anymore lol. how long does it take for you to generate a 1024 by 1024 image?

Kosinkadink

Comfy Org org Apr 25

Hey, just wanted to clarity that native core ComfyUI nodes will NOT cast to fp8 unless they are explicitly told to do so. Offloading only involves moving between the offload device ('cpu') and the load device ('cuda:0' on nvidia cards); if a tensor is in fp16, it will be kept as fp16. Offloading works by putting tensors that would not fit in VRAM onto the offload device until the very moment they are needed. This results in slower performance but avoids OOM. There are some bugs to be ironed out with this system, but that is how it works in a nutshell and will not cast to fp8 as part of that process.

MaxedOut

Apr 25

•

edited Apr 25

thanks for clearing that up Kosinkadink. ran to comfyui with flux straight from black forest labs (so not fp8) with 16 gb vram and yep it did not load it in fp8. not sure where i got that from and sorry rosgrocar for misleading you. learned my lesson. fact check myself before i start preaching. or just leave it to the professionals lol

rosgrocar

May 6

Thanks for the clarification Kosinkadink! And I appreciate that info as well, Maxedout. So that makes it even more interesting then.. I guess I'm actually running the fp16 on my 12GB VRAM and the speed is equal to or slightly faster than any of the Q4+ GGUF's aside from the initial loading times as GGUF seems to load much quicker than the safetensors files.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment