MXFP4 only runs on h100 or b100 or later versions,
ValueError: MXFP4 quantized models is only supported on GPUs with compute capability >= 9.0 (e.g H100, or B100)
I spend few hours for installing and running in mxp4 from following blogs:
https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers
https://huggingface.co/blog/welcome-openai-gpt-oss
Yes, I can confirm. Im trying to set this up on my RTX4090 and im getting this error
ValueError: MXFP4 quantized models is only supported on GPUs with compute capability >= 9.0 (e.g H100, or B100)
[UPDATE]
THIS FIXES IT
With transformers main, it should even work on a T4 ! Please try to following google colab: https://colab.research.google.com/drive/15DJv6QWgc49MuC7dlNS9ifveXBDjCWO5?usp=sharing
Thanks for sharing the Google Colab notebook,
@marcsun13
. I was able to get it working with one small change: adding !pip install kernels
to get around the "MXFP4 quantization requires triton >= 3.4.0 and kernels installed, we will default to dequantizing the model to bf16
" error I was otherwise getting. (I also didn't need to restart the session).
For y'all that are struggling with the MXFP4 incompatibility. Here is how you gonna fix it:
pip uninstall transformers
git clone https://github.com/huggingface/transformers.git
cd transformers
pip install .[torch]
In your code example, just to be sure, change 'torch_dtype="auto"' to 'torch_dtype=torch.bfloat16' (dont forget to 'import torch')
FYI this is the PR that fixes this problem, so its actually not openai's fault. Dont be mean. Use ur brain instead
https://github.com/huggingface/transformers/pull/39940
Your welcome
I tried your colab @marcsun13 but its not working with T4 facing space issue
OutOfMemoryError Traceback (most recent call last)
/tmp/ipython-input-1126326042.py in <cell line: 0>()
5 tokenizer = AutoTokenizer.from_pretrained(model_id)
6 quantization_config = Mxfp4Config()
----> 7 model = AutoModelForCausalLM.from_pretrained(
8 model_id,
9 torch_dtype="auto",
9 frames
/usr/local/lib/python3.11/dist-packages/transformers/integrations/mxfp4.py in convert_moe_packed_tensors(blocks, scales, dtype, rows_per_chunk)
122 # nibble indices -> int64
123 idx_lo = (blk & 0x0F).to(torch.long)
--> 124 idx_hi = (blk >> 4).to(torch.long)
125
126 sub = out[r0:r1]
OutOfMemoryError: CUDA out of memory. Tried to allocate 1.98 GiB. GPU 0 has a total capacity of 14.74 GiB of which 1.96 GiB is free. Process 14311 has 12.77 GiB memory in use. Of the allocated memory 10.15 GiB is allocated by PyTorch, and 2.51 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
ValueError Traceback (most recent call last)
/tmp/ipython-input-2717120482.py in <cell line: 0>()
4
5 tokenizer = AutoTokenizer.from_pretrained(model_id)
----> 6 model = AutoModelForCausalLM.from_pretrained(
7 model_id,
8 torch_dtype="auto",
3 frames
/usr/local/lib/python3.11/dist-packages/transformers/quantizers/auto.py in merge_quantization_configs(cls, quantization_config, quantization_config_from_args)
215
216 if quantization_config.class.name != quantization_config_from_args.class.name:
--> 217 raise ValueError(
218 f"The model is quantized with {quantization_config.class.name} but you are passing a {quantization_config_from_args.class.name} config. "
219 "Please make sure to pass the same quantization config class to from_pretrained
with different loading attributes."
ValueError: The model is quantized with Mxfp4Config but you are passing a NoneType config. Please make sure to pass the same quantization config class to from_pretrained
with different loading attributes.
run the notebook still getting the error
@punctualprocrastinator , I was getting this same error earlier but it's working for me now. I think it was just recently fixed: https://github.com/huggingface/transformers/pull/40026
still OOM in colab with T4
Even if you can fit the model, it will give error. The weights are in MxFp4 we need fp4 . I have loaded successfully in colab but error on generation.
https://colab.research.google.com/drive/162vo7DtV7UvlNInVjj-s4wkSjSGK6Dkj?usp=sharing