R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning

🤗 [R-Search Datasets] • 💻 [Github Repo]

R-Search is a novel reinforcement learning framework for reasoning–search integration. It enables LLMs to autonomously perform multi-step reasoning with deep search interaction, and to learn optimal reasoning–search trajectories via multi-reward signals, substantially improving performance on complex logic- and knowledge-intensive tasks.

Trained Models

We open-sourced the following models trained only on the 2wikimultihopqa training set:

Model Huggingface Repo Description
R-Search-7b-grpo 🤗 Huggingface Repo Trained Qwen2.5-7B-Instruct using the GRPO algorithm
R-Search-3b-grpo 🤗 Huggingface Repo Trained Qwen2.5-3B-Instruct using the GRPO algorithm
R-Search-7b-ppo 🤗 Huggingface Repo Trained Qwen2.5-7B-Instruct using the PPO algorithm
R-Search-3b-ppo 🤗 Huggingface Repo Trained Qwen2.5-3B-Instruct using the PPO algorithm
Downloads last month
5
Safetensors
Model size
3.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train qingfei1/R-Search-3b-grpo

Collection including qingfei1/R-Search-3b-grpo