Evaluation

!lm_eval --model hf \
    --model_args pretrained=jaeyong2/Qwen3-0.6B-DPO-Ja-Peft \
    --tasks mmlu,japanese_leaderboard,gsm8k \
    --device cuda:0 \
    --batch_size 1 \
    --num_fewshot 5
Qwen3-0.6B-DPO-ja Qwen3-0.6B
MMLU 0.41 0.40
ja_leaderboard_jaqket_v2 0.30 0.28
ja_leaderboard_jcommonsenseqa 0.45 0.44
ja_leaderboard_jnli 0.24 0.26
ja_leaderboard_jsquad 0.49 0.48
ja_leaderboard_marc_ja 0.87 0.86
ja_leaderboard_mgsm 0.12 0.11
ja_leaderboard_xlsum 0.09 0.08
ja_leaderboard_xwinograd 0.55 0.55
GSM8K 0.44 0.42

License

Acknowledgement

This research is supported by TPU Research Cloud program.

Downloads last month
36
Safetensors
Model size
596M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support