How to run it on macOS

#84

by kopyl - opened Aug 7

Discussion

kopyl

Aug 7

Please provide a detailed instructions

kopyl

Aug 8

There is metal/model.bin file. How do you run it?

josesho

Aug 8

On M1 Max 32GB RAM, I get this error.

ValueError: The model is quantized with Mxfp4Config but you are passing a NoneType config. Please make sure to pass the same quantization config class to `from_pretrained` with different loading attributes.

Not sure what else to do to run the 20B on non-CUDA GPUs?

Arnab07

Aug 8

you can run it with ollama, tested it myself. works

josesho

Aug 8

you can run it with ollama, tested it myself. works

For my purposes I want to run it with Python ‘transformers‘ library. Any tips for that would be good.

josesho

Aug 8

•

edited Aug 8

No description provided.

marat-openai

OpenAI org Aug 11

Metal inference instructions are located in the gpt-oss GitHub repo: https://github.com/openai/gpt-oss?tab=readme-ov-file#reference-metal-implementation
Please note that you'd need Apple M2 or later chip to use this implementation.

palivula

Aug 12

•

edited Aug 12

Also runs in LM studio just fine too. I havent tried any Lora on it yet but no reason why it isnt tweakable either. All looks fine. Shame its lobotomised though.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment