How to run it on macOS
#84
by
kopyl
- opened
Please provide a detailed instructions
There is metal/model.bin
file. How do you run it?
On M1 Max 32GB RAM, I get this error.
ValueError: The model is quantized with Mxfp4Config but you are passing a NoneType config. Please make sure to pass the same quantization config class to `from_pretrained` with different loading attributes.
Not sure what else to do to run the 20B on non-CUDA GPUs?
you can run it with ollama, tested it myself. works
you can run it with ollama, tested it myself. works
For my purposes I want to run it with Python ‘transformers‘ library. Any tips for that would be good.
No description provided.
Metal inference instructions are located in the gpt-oss GitHub repo: https://github.com/openai/gpt-oss?tab=readme-ov-file#reference-metal-implementation
Please note that you'd need Apple M2 or later chip to use this implementation.