Evaluation

!lm_eval --model hf \
    --model_args pretrained=jaeyong2/Qwen3-0.6B-DPO-Ja-Peft \
    --tasks kmmlu,mmlu,japanese_leaderboard,gsm8k \
    --device cuda:0 \
    --batch_size 1 
Qwen3-1.7B-DPO-peft Qwen3-1.7B
MMLU 0.55 0.55
ja_leaderboard_jaqket_v2 0.35 0.34
ja_leaderboard_jcommonsenseqa 0.48 0.46
ja_leaderboard_jnli 0.28 0.23
ja_leaderboard_jsquad 0.25 0.22
ja_leaderboard_marc_ja 0.65 0.74
ja_leaderboard_mgsm 0.42 0.40
ja_leaderboard_xlsum 0.10 0.11
ja_leaderboard_xwinograd 0.58 0.58
GSM8K 0.70 0.69
KMMLU 0.36 0.36

License

Acknowledgement

This research is supported by TPU Research Cloud program.

Downloads last month
22
Safetensors
Model size
1.72B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support