You guys are the pioneers!

#6
by owao - opened

I was sad to ditch out QwQ-preview in favor to other newer thinking models...
And now soooo glad seeing this comeback as a final version! And what a comeback!

Are the differences between the preview snapshot and this final version only:

  • scaling up the math/coding RL even more
  • and adding those final few steps of reward model/rule-based RL on general capabilities
    Or did you also apply some other training in addition/in between?

Thanks by advance for any answer.

And thanks again for sharing your final artifact, that's really cool! Having a model that is so much cheaper to distill from opens doors!

Sign up or log in to comment