Can you release online RL prompts ?

by sparsh35 - opened Mar 13

Mar 13

As I read the tech report, you say that you used sampling accuracy criteria to find the hard or orthogonal dataset for online RL (GRPO) using Qwen 32B R1 , can you please release that dataset as well.
Thanks and Congrats for the release.
All the best.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment