shisa-ai/017-qwen3-8b-v2-dpo405b-clr

This is a WIP version of Qwen3 8B post-trained on the full Shisa V2 recipe.

This is a non-reasoning model and thinking has been disabled in the default chat_template.

This will be replaced shortly by a V2.1, but preliminary benchmarks suggest that it is quite strong.

Shaberi (judged by GPT-4.1):

Model	Average	ELYZA 100	JA-MT	Rakuda	Tengu
017-qwen3-8b-v2-dpo405b-clr-nothink	7.75	7.88	8.08	8.08	6.94
shisa-ai/shisa-v2-llama3.1-8b	7.14	7.54	6.83	7.85	6.34
shisa-ai/shisa-v2-qwen2.5-7b	7.10	7.48	7.40	7.18	6.33

And JA MT-Bench (judged by GPT-4.1):

Model	coding	extraction	humanities	math	reasoning	roleplay	stem	writing	Overall
017-qwen3-8b-v2-dpo405b-clr-nothink	7.3	7.55	8.85	9.3	6.05	7.9	8.6	8.9	8.06
shisa-ai/shisa-v2-qwen2.5-7b	6.7	7.15	7.55	8.5	5.4	7.9	7.5	7.7	7.3
shisa-ai/shisa-v2-llama3.1-8b	5.3	6.95	8.4	6.55	5.95	7.65	7.25	7.9	6.99

shisa-ai
/

017-qwen3-8b-v2-dpo405b-clr