MXFP4 only runs on h100 or b100 or later versions,

#61

by kishan51 - opened Aug 6

Aug 6

ValueError: MXFP4 quantized models is only supported on GPUs with compute capability >= 9.0 (e.g H100, or B100)

I spend few hours for installing and running in mxp4 from following blogs:
https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers
https://huggingface.co/blog/welcome-openai-gpt-oss

BwandoWando

Aug 6

•

edited Aug 6

Yes, I can confirm. Im trying to set this up on my RTX4090 and im getting this error

ValueError: MXFP4 quantized models is only supported on GPUs with compute capability >= 9.0 (e.g H100, or B100)

[UPDATE]

THIS FIXES IT

https://github.com/huggingface/transformers/pull/39940

marcsun13

Aug 7

With transformers main, it should even work on a T4 ! Please try to following google colab: https://colab.research.google.com/drive/15DJv6QWgc49MuC7dlNS9ifveXBDjCWO5?usp=sharing

u1f917

Aug 7

•

edited Aug 11

Thanks for sharing the Google Colab notebook, @marcsun13 . I was able to get it working with one small change: adding !pip install kernels to get around the "MXFP4 quantization requires triton >= 3.4.0 and kernels installed, we will default to dequantizing the model to bf16" error I was otherwise getting. (I also didn't need to restart the session).

Edit: This change is no longer required in the updated notebook.

vitriol1

Aug 7

For y'all that are struggling with the MXFP4 incompatibility. Here is how you gonna fix it:
pip uninstall transformers
git clone https://github.com/huggingface/transformers.git
cd transformers
pip install .[torch]

In your code example, just to be sure, change 'torch_dtype="auto"' to 'torch_dtype=torch.bfloat16' (dont forget to 'import torch')
FYI this is the PR that fixes this problem, so its actually not openai's fault. Dont be mean. Use ur brain instead
https://github.com/huggingface/transformers/pull/39940

Your welcome

BwandoWando

Aug 8

@vitriol1

ah yes, thank you for pointing out what we already know a day before you posted

Ajayan

Aug 8

I tried your colab @marcsun13 but its not working with T4 facing space issue

OutOfMemoryError Traceback (most recent call last)
/tmp/ipython-input-1126326042.py in <cell line: 0>()
5 tokenizer = AutoTokenizer.from_pretrained(model_id)
6 quantization_config = Mxfp4Config()
----> 7 model = AutoModelForCausalLM.from_pretrained(
8 model_id,
9 torch_dtype="auto",

9 frames
/usr/local/lib/python3.11/dist-packages/transformers/integrations/mxfp4.py in convert_moe_packed_tensors(blocks, scales, dtype, rows_per_chunk)
122 # nibble indices -> int64
123 idx_lo = (blk & 0x0F).to(torch.long)
--> 124 idx_hi = (blk >> 4).to(torch.long)
125
126 sub = out[r0:r1]

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.98 GiB. GPU 0 has a total capacity of 14.74 GiB of which 1.96 GiB is free. Process 14311 has 12.77 GiB memory in use. Of the allocated memory 10.15 GiB is allocated by PyTorch, and 2.51 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

punctualprocrastinator

Aug 8

ValueError Traceback (most recent call last)
/tmp/ipython-input-2717120482.py in <cell line: 0>()
4
5 tokenizer = AutoTokenizer.from_pretrained(model_id)
----> 6 model = AutoModelForCausalLM.from_pretrained(
7 model_id,
8 torch_dtype="auto",

3 frames
/usr/local/lib/python3.11/dist-packages/transformers/quantizers/auto.py in merge_quantization_configs(cls, quantization_config, quantization_config_from_args)
215
216 if quantization_config.class.name != quantization_config_from_args.class.name:
--> 217 raise ValueError(
218 f"The model is quantized with {quantization_config.class.name} but you are passing a {quantization_config_from_args.class.name} config. "
219 "Please make sure to pass the same quantization config class to from_pretrained with different loading attributes."

ValueError: The model is quantized with Mxfp4Config but you are passing a NoneType config. Please make sure to pass the same quantization config class to from_pretrained with different loading attributes.
run the notebook still getting the error

u1f917

Aug 8

@punctualprocrastinator , I was getting this same error earlier but it's working for me now. I think it was just recently fixed: https://github.com/huggingface/transformers/pull/40026

SeanCama

Aug 8

still OOM in colab with T4

kishan51

Aug 8

Even if you can fit the model, it will give error. The weights are in MxFp4 we need fp4 . I have loaded successfully in colab but error on generation.
https://colab.research.google.com/drive/162vo7DtV7UvlNInVjj-s4wkSjSGK6Dkj?usp=sharing

AyrtonDextre

Aug 11

The colab of @marcsun13 works with T4 (at least for today).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment