Evaluation
!lm_eval --model hf \
--model_args pretrained=jaeyong2/Qwen3-0.6B-DPO-Ja-Peft \
--tasks mmlu,japanese_leaderboard,gsm8k \
--device cuda:0 \
--batch_size 1 \
--num_fewshot 5
Qwen3-0.6B-DPO-ja | Qwen3-0.6B | |
---|---|---|
MMLU | 0.41 | 0.40 |
ja_leaderboard_jaqket_v2 | 0.30 | 0.28 |
ja_leaderboard_jcommonsenseqa | 0.45 | 0.44 |
ja_leaderboard_jnli | 0.24 | 0.26 |
ja_leaderboard_jsquad | 0.49 | 0.48 |
ja_leaderboard_marc_ja | 0.87 | 0.86 |
ja_leaderboard_mgsm | 0.12 | 0.11 |
ja_leaderboard_xlsum | 0.09 | 0.08 |
ja_leaderboard_xwinograd | 0.55 | 0.55 |
GSM8K | 0.44 | 0.42 |
License
- Qwen/Qwen3-0.6B : https://choosealicense.com/licenses/apache-2.0/
Acknowledgement
This research is supported by TPU Research Cloud program.
- Downloads last month
- 36
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support