can you please GGUF versions of fp8? Not sure if that makes any sense but just save some storage and processing power.

#1
by ryg81 - opened

can you please GGUF versions of fp8? Not sure if that makes any sense but just save some storage and processing power.

Owner

Fp8, even when scaled, actually uses less processing than GGUF, which needs more complicated dequantization operation when used.

This is an alternative to GGUF with quality close to Q8_0 , without the speed loss of using GGUF in general. For lower quants models already exists all around Huggingface anyway.

Sign up or log in to comment