
LibraxisAI/Llama-3_1-Nemotron-Ultra-253B-v1-MLX-Q5
253B
•
Updated
•
37
Medical LLM | MLX | MLX-LM | Self-hosting | Finetuning | Conversational Agents
mlx-lm
runtime! mlx_lm.server
runner using our modified .jinja
template and custom tokenizer_config.json
. mlx==0.26.2
and it is so promising! Our beloved (
CohereLabs/c4ai-command-a-03-2025) the model we use for our medical-nlp-use : 200gb+ bfloat16 vs. ~70gb q5 mlx with perfect quality (
LibraxisAI/c4ai-command-a-03-2025-q5-mlx)mlx>=0.26.0
. This quant is not yet supported by lmstudio
but we didn't check the Ollama compatibility. For inference at this moment we us native mlx_lm.chat
or mlx_lm.server
, which works perfectly with python mx.generate
pipeline. Repository consists of complete safetensors and config files **plus** we put there also our everyday command-generation scripts - convert-to-mlx.sh
for mlx_lm.convert
with variable parameters customization, and mlx-serve.sh
for easy server building. Enjoy