RLHFlow/Qwen2.5-Math-7B-Zero-Reinforce-Rej Text Generation • 8B • Updated May 21, 2025 • 8 • 1
RLHFlow/Llama3.1-8B-PRM-Deepseek-Data Text Generation • 8B • Updated May 10, 2025 • 3.84k • • 37
RLHFlow/Decision-Tree-Reward-Gemma-2-27B Text Classification • 27B • Updated Jan 24, 2025 • 19 • 8
RLHFlow/Decision-Tree-Reward-Llama-3.1-8B Text Classification • 8B • Updated Jan 24, 2025 • 21 • 7
RLHFlow/Llama3.1-8B-PRM-Mistral-Data Text Generation • 8B • Updated Nov 9, 2024 • 247 • • 10