Evaluation
!lm_eval --model hf \
--model_args pretrained=jaeyong2/Qwen3-0.6B-DPO-Ja-Peft \
--tasks kmmlu,mmlu,japanese_leaderboard,gsm8k \
--device cuda:0 \
--batch_size 1
Qwen3-1.7B-DPO-peft | Qwen3-1.7B | |
---|---|---|
MMLU | 0.55 | 0.55 |
ja_leaderboard_jaqket_v2 | 0.35 | 0.34 |
ja_leaderboard_jcommonsenseqa | 0.48 | 0.46 |
ja_leaderboard_jnli | 0.28 | 0.23 |
ja_leaderboard_jsquad | 0.25 | 0.22 |
ja_leaderboard_marc_ja | 0.65 | 0.74 |
ja_leaderboard_mgsm | 0.42 | 0.40 |
ja_leaderboard_xlsum | 0.10 | 0.11 |
ja_leaderboard_xwinograd | 0.58 | 0.58 |
GSM8K | 0.70 | 0.69 |
KMMLU | 0.36 | 0.36 |
License
- Qwen/Qwen3-0.6B : https://choosealicense.com/licenses/apache-2.0/
Acknowledgement
This research is supported by TPU Research Cloud program.
- Downloads last month
- 22
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support