Checkpoints for "Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models" https://arxiv.org/abs/2410.18252
Shengyi Costa Huang
vwxyzjn
AI & ML interests
None yet
Recent Activity
liked
a model
about 1 month ago
deepseek-ai/DeepSeek-R1-0528
updated
a dataset
2 months ago
vwxyzjn/the-algorithm-python
updated
a dataset
2 months ago
vwxyzjn/rlvr_acecoder