Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published 10 days ago • 39
Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers Paper • 2406.10991 • Published Jun 16, 2024 • 1
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation Paper • 2310.03214 • Published Oct 5, 2023 • 20