How to run it on macOS

#84
by kopyl - opened

Please provide a detailed instructions

There is metal/model.bin file. How do you run it?

On M1 Max 32GB RAM, I get this error.

ValueError: The model is quantized with Mxfp4Config but you are passing a NoneType config. Please make sure to pass the same quantization config class to `from_pretrained` with different loading attributes.

Not sure what else to do to run the 20B on non-CUDA GPUs?

you can run it with ollama, tested it myself. works

you can run it with ollama, tested it myself. works

For my purposes I want to run it with Python ‘transformers‘ library. Any tips for that would be good.

No description provided.

Metal inference instructions are located in the gpt-oss GitHub repo: https://github.com/openai/gpt-oss?tab=readme-ov-file#reference-metal-implementation
Please note that you'd need Apple M2 or later chip to use this implementation.

Sign up or log in to comment