Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
chansung 
posted an update 2 days ago
Post
3150
simple guide on the recipe for GRPO on Open-R1 which is built on top of TRL

I think FastAPI wrapper of vLLM with WeightSyncWorker is pretty cool feature. Also, we have many predefined reward functions out of the box!

Very cool

·

Thanks!

Question! Can you explain if the vram usage increases if you increase the max # of generations per prompt, if so, why does that happen?

·

Because more tokens has to be stored in vram?

Cool frens!