@chansung on Hugging Face: "simple guide on the recipe for GRPO on Open-R1 which is built on top of TRL…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

chansung

posted an update Mar 25

Post

3910

simple guide on the recipe for GRPO on Open-R1 which is built on top of TRL

I think FastAPI wrapper of vLLM with WeightSyncWorker is pretty cool feature. Also, we have many predefined reward functions out of the box!

takarajordan

Mar 25

Very cool

chansung

Mar 25

Thanks!

smirki

Mar 26

Question! Can you explain if the vram usage increases if you increase the max # of generations per prompt, if so, why does that happen?

chansung

Mar 26

Because more tokens has to be stored in vram?

xinnn63

Mar 27

Cool frens!

In this post

chansung chansung park
takarajordan Jordan Legg
smirki Manav Majumdar
xinnn63 Natalie H