ReLIFT
Collection
ReLIFT, a training method that interleaves RL with online FT, achieving superior performance and efficiency compared to using RL or SFT alone.
•
8 items
•
Updated
•
1
The base Qwen2.5-7B model used by ReLIFT. We modify the chat_template for the system prompt and add .
Github: https://github.com/TheRoadQaQ/ReLIFT
If you find our model, data, or evaluation code useful, please kindly cite our paper: