MXFP4 only runs on h100 or b100 or later versions,

#61
by kishan51 - opened

ValueError: MXFP4 quantized models is only supported on GPUs with compute capability >= 9.0 (e.g H100, or B100)

I spend few hours for installing and running in mxp4 from following blogs:
https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers
https://huggingface.co/blog/welcome-openai-gpt-oss

Yes, I can confirm. Im trying to set this up on my RTX4090 and im getting this error

ValueError: MXFP4 quantized models is only supported on GPUs with compute capability >= 9.0 (e.g H100, or B100)

Screenshot from 2025-08-07 00-01-21.png

[UPDATE]

THIS FIXES IT

https://github.com/huggingface/transformers/pull/39940

With transformers main, it should even work on a T4 ! Please try to following google colab: https://colab.research.google.com/drive/15DJv6QWgc49MuC7dlNS9ifveXBDjCWO5?usp=sharing

Thanks for sharing the Google Colab notebook, @marcsun13 . I was able to get it working with one small change: adding !pip install kernels to get around the "MXFP4 quantization requires triton >= 3.4.0 and kernels installed, we will default to dequantizing the model to bf16" error I was otherwise getting. (I also didn't need to restart the session).

For y'all that are struggling with the MXFP4 incompatibility. Here is how you gonna fix it:
pip uninstall transformers
git clone https://github.com/huggingface/transformers.git
cd transformers
pip install .[torch]

In your code example, just to be sure, change 'torch_dtype="auto"' to 'torch_dtype=torch.bfloat16' (dont forget to 'import torch')
FYI this is the PR that fixes this problem, so its actually not openai's fault. Dont be mean. Use ur brain instead
https://github.com/huggingface/transformers/pull/39940

Your welcome

@vitriol1

ah yes, thank you for pointing out what we already know a day before you posted

I tried your colab @marcsun13 but its not working with T4 facing space issue


OutOfMemoryError Traceback (most recent call last)
/tmp/ipython-input-1126326042.py in <cell line: 0>()
5 tokenizer = AutoTokenizer.from_pretrained(model_id)
6 quantization_config = Mxfp4Config()
----> 7 model = AutoModelForCausalLM.from_pretrained(
8 model_id,
9 torch_dtype="auto",

9 frames
/usr/local/lib/python3.11/dist-packages/transformers/integrations/mxfp4.py in convert_moe_packed_tensors(blocks, scales, dtype, rows_per_chunk)
122 # nibble indices -> int64
123 idx_lo = (blk & 0x0F).to(torch.long)
--> 124 idx_hi = (blk >> 4).to(torch.long)
125
126 sub = out[r0:r1]

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.98 GiB. GPU 0 has a total capacity of 14.74 GiB of which 1.96 GiB is free. Process 14311 has 12.77 GiB memory in use. Of the allocated memory 10.15 GiB is allocated by PyTorch, and 2.51 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)


ValueError Traceback (most recent call last)
/tmp/ipython-input-2717120482.py in <cell line: 0>()
4
5 tokenizer = AutoTokenizer.from_pretrained(model_id)
----> 6 model = AutoModelForCausalLM.from_pretrained(
7 model_id,
8 torch_dtype="auto",

3 frames
/usr/local/lib/python3.11/dist-packages/transformers/quantizers/auto.py in merge_quantization_configs(cls, quantization_config, quantization_config_from_args)
215
216 if quantization_config.class.name != quantization_config_from_args.class.name:
--> 217 raise ValueError(
218 f"The model is quantized with {quantization_config.class.name} but you are passing a {quantization_config_from_args.class.name} config. "
219 "Please make sure to pass the same quantization config class to from_pretrained with different loading attributes."

ValueError: The model is quantized with Mxfp4Config but you are passing a NoneType config. Please make sure to pass the same quantization config class to from_pretrained with different loading attributes.
run the notebook still getting the error

@punctualprocrastinator , I was getting this same error earlier but it's working for me now. I think it was just recently fixed: https://github.com/huggingface/transformers/pull/40026

still OOM in colab with T4

Even if you can fit the model, it will give error. The weights are in MxFp4 we need fp4 . I have loaded successfully in colab but error on generation.
https://colab.research.google.com/drive/162vo7DtV7UvlNInVjj-s4wkSjSGK6Dkj?usp=sharing

Sign up or log in to comment