YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Gpt-OSS-20B-MXFP4-GGUF

GGUF MXFP4_MOE quant of openai/gpt-OSS-20b. This GGUF model was quantized from the dequantized/Upcasted F32 of the model (not including the MoE layers - as per ggeranov's assertion: "we don't mess with the bits and their placement. We just trust that OpenAI did a good job" and convert from HuggingFace to GGUF). This was done to help preserve and improve the model's accuracy and precision post quantization.

Note: After further experimentation, it turns out it is best to keep the MXFP4 MoE layers in their given state and not fully-dequantize/Upcast to F32. Because, for the aforementioned reason from ggeranov, this leads to a regression in performance. The only reason this is a reality for us, is because llama.cpp just converts from HuggingFace to GGUF for the MOE_Layers. If this wasn't the case, my method for dequantizing/upcasting the model weights to F32 and quantizing would remain the best method for quantizing. And, I would like to add that when llama.cpp finally supports imatrix calibration/training for the MXFP4 MOE layers, we should be able to fully-dequantize/upcast the model weights, calibrate/train an imatrix for it and then quantize using the imatrix to improve the quants accuracy and model preservation. But, unfortunately, we are still waiting for that PR to be made manifest. So, in the meantime, here's the best next option.

filesize: (~12.11 GB)

Downloads last month
641
GGUF
Model size
20.9B params
Architecture
gpt-oss
Hardware compatibility
Log In to view the estimation

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support