can you please GGUF versions of fp8? Not sure if that makes any sense but just save some storage and processing power.

by ryg81 - opened Jul 22

Discussion

ryg81

Jul 22

can you please GGUF versions of fp8? Not sure if that makes any sense but just save some storage and processing power.

Kijai

Owner Jul 22

Fp8, even when scaled, actually uses less processing than GGUF, which needs more complicated dequantization operation when used.

This is an alternative to GGUF with quality close to Q8_0 , without the speed loss of using GGUF in general. For lower quants models already exists all around Huggingface anyway.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment