Does Macbook M1 max 64GB run this model well?
#44
by
mrk83
- opened
Just curious which version of the models can best optimize its performance on mac m1 max
Starts at 14.5tk/s on q4. goes down to 11tokens on 900 tokens response (A tiny one for this model), 2056 tokens, 9.847 tokens-per-sec.
This model is small but very token hungry (I constantly reach 15k tokens). Quite usable. but not a fast one. I still favor mistral small or Qwen 14b , and only use QwQ 32b locally if I absolutely have to. I do vouch for it anyway, as this model is very capable. It is BY FAR the best model that I can run offline, many times it's coding results are more similar R1 than they are to 32b models...
Be sure to check the suggested params, this model is very sensitive for params changes.