Qwen/QwQ-32B · You guys are the pioneers!

I was sad to ditch out QwQ-preview in favor to other newer thinking models...
And now soooo glad seeing this comeback as a final version! And what a comeback!

Are the differences between the preview snapshot and this final version only:

scaling up the math/coding RL even more
and adding those final few steps of reward model/rule-based RL on general capabilities
Or did you also apply some other training in addition/in between?

Thanks by advance for any answer.

And thanks again for sharing your final artifact, that's really cool! Having a model that is so much cheaper to distill from opens doors!