Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking
Abstract
We present a novel approach for training small language models for reasoning-intensive document ranking that combines knowledge distillation with reinforcement learning optimization. While existing methods often rely on expensive human annotations or large black-box language models, our methodology leverages web data and a teacher LLM to automatically generate high-quality training examples with relevance explanations. By framing document ranking as a reinforcement learning problem and incentivizing explicit reasoning capabilities, we train a compact 3B parameter language model that achieves state-of-the-art performance on the BRIGHT benchmark. Our model ranks third on the leaderboard while using substantially fewer parameters than other approaches, outperforming models that are over 20 times larger. Through extensive experiments, we demonstrate that generating explanations during inference, rather than directly predicting relevance scores, enables more effective reasoning with smaller language models. The self-supervised nature of our method offers a scalable and interpretable solution for modern information retrieval systems.
Community
In this paper we describe the distillation + reinforcement learning training recipe for a compact 2-3B re-ranker that reaches the performance of 70B+ LLMs and SOTA performance on reasoning-intensive re-ranking on the BRIGHT benchmark.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning (2025)
- Teaching Dense Retrieval Models to Specialize with Listwise Distillation and LLM Data Augmentation (2025)
- Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones? (2025)
- DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning (2025)
- Self-Enhanced Reasoning Training: Activating Latent Reasoning in Small Models for Enhanced Reasoning Distillation (2025)
- Small Models Struggle to Learn from Strong Reasoners (2025)
- LLM-QE: Improving Query Expansion by Aligning Large Language Models with Ranking Preferences (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper