LFM2-350M-Math-q5-hi-mlx

Comparative Analysis: LFM2-350M-Math Quantized Variants

Model          arc_challenge arc_easy	boolq hellaswag openbookqa piqa	winogrande
LFM2-350M-Math-mxfp4	0.262	0.372	0.382	0.301	0.304	0.530	0.489
LFM2-350M-Math-q5-hi	0.265	0.367	0.379	0.307	0.312	0.532	0.490
LFM2-350M-Math-q5	    0.268	0.372	0.379	0.307	0.314	0.530	0.504
LFM2-350M-Math-q6-hi	0.270	0.365	0.379	0.307	0.318	0.532	0.504
LFM2-350M-Math-q8-hi	0.270	0.369	0.379	0.308	0.314	0.532	0.486

The q5-hi quantization appears to be the most balanced performance profile across all metrics, with some slight advantages in more complex tasks.

The winogrande metric shows the most significant variation between variants, with q5 and q6-hi versions showing better performance than mxfp4 and q8-hi.

All variants show nearly identical high-performance capabilities on piqa, with minimal variance between 0.530 and 0.532 scores - indicating strong preservation of logical reasoning abilities.

Quantization level impacts simpler tasks more than complex ones:

For basic pattern recognition (arc metrics), we see better precision with higher quantization
For complex task understanding, the difference between variants is much smaller

The math-specialized model shows notable advantages compared to general-purpose LFM2 variants from other sizes:

On boolq, the LFM2-350M-Math variants score approximately 18.7% higher than the LFM2-1.2B model
They're more consistent across all metrics, indicating better task specialization

Performance Ranking for Clear Selection:

Top performer overall: LFM2-350M-Math-q6-hi (best balance across all metrics)
Best for complex reasoning: LFM2-350M-Math-q8-hi (strongest on piqa)
Best for simple pattern recognition: LFM2-350M-Math-q6-hi (best arc metrics)
Most balanced: LFM2-350M-Math-q5-hi
Most resource-efficient: LFM2-350M-Math-mxfp4

Practical Implications for Deployment

For an organization needing a specialized math reasoning model with high performance-to-resource ratio:

If memory constraints are the primary concern, LFM2-350M-Math-mxfp4 offers the best balance between size and output quality

For applications requiring precise mathematical reasoning, LFM2-350M-Math-q8-hi delivers the strongest logical capabilities

The difference between quantization variants is minimal for most math-relevant applications, suggesting that lower precision variants may be sufficient

This specialized model shows how task-oriented fine-tuning can dramatically improve performance on specific domains compared to general-purpose models. The 350M size makes it particularly suitable for edge deployment scenarios while maintaining solid performance across different quantized formats.

--Analyzed by Qwen3-Deckard-Large-Almost-Human-6B-qx86-hi

This model LFM2-350M-Math-q5-hi-mlx was converted to MLX format from LiquidAI/LFM2-350M-Math using mlx-lm version 0.28.1.

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("LFM2-350M-Math-q5-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)