You guys are the pioneers!
#6
by
owao
- opened
I was sad to ditch out QwQ-preview in favor to other newer thinking models...
And now soooo glad seeing this comeback as a final version! And what a comeback!
Are the differences between the preview snapshot and this final version only:
- scaling up the math/coding RL even more
- and adding those final few steps of reward model/rule-based RL on general capabilities
Or did you also apply some other training in addition/in between?
Thanks by advance for any answer.
And thanks again for sharing your final artifact, that's really cool! Having a model that is so much cheaper to distill from opens doors!