ReLIFT, a training method that interleaves RL with online FT, achieving superior performance and efficiency compared to using RL or SFT alone, as described in Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions.

Code: https://github.com/TheRoadQaQ/ReLIFT Project page: https://github.com/TheRoadQaQ/ReLIFT

Downloads last month
26
Safetensors
Model size
7.62B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RoadQAQ/ReLIFT-Qwen2.5-7B-Zero

Quantizations
1 model

Collection including RoadQAQ/ReLIFT-Qwen2.5-7B-Zero