The qx64-mlx quantization
This is a formula I use for Qwens, mostly MoEs but it showed to work on dense models as well. I did not know whether it will work for Apertus, and probably could get mixed results(running integration tests now). The formula uses mixed precision layers, and is basically a 4bit quant with 6bit paths for attention and context(Deckard)
I shared it because I found it to be apt in coding in Janet, and once it gets going, it develops an appetite for coding.
Interesting model
thanks!
there are already a few other quantized versions for MLX here, i tried the 4bit and 8bit and works quite well: https://huggingface.co/models?search=apertus%20mlx
looking forward to try yours also. did you do yours for the 70B or 8B, and is there a link also?
To be honest I did not know whether to upload the 8B after trying it first time. It is quite different from interaction with a similar model from Qwen. After trying the settings on the 8B model I found that TopK around 20 makes it a bit more fluent. The default settings in LMStudio are not helping.
I am uploading now the nightmedia/Apertus-8B-Instruct-2509-qx86-mlx
Similar formula, more bits. Small models lose a lot more at quantization. I tried even mxfp4, works eh-ok on the 70B, not so much on the 8B.
In some cases this approach sharpens the quality of the output, I am really curious how it will do on an 8B, as I only did a couple small tests. I ran into a MoE that was outperforming at qx86-hi the parent model at BF16, and even the qx64-hi was getting pretty close