can you please GGUF versions of fp8? Not sure if that makes any sense but just save some storage and processing power.
#1
by
ryg81
- opened
can you please GGUF versions of fp8? Not sure if that makes any sense but just save some storage and processing power.
Fp8, even when scaled, actually uses less processing than GGUF, which needs more complicated dequantization operation when used.
This is an alternative to GGUF with quality close to Q8_0 , without the speed loss of using GGUF in general. For lower quants models already exists all around Huggingface anyway.