Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published 20 days ago • 43
Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers Paper • 2406.10991 • Published Jun 16, 2024 • 1
Long-range Language Modeling with Self-retrieval Paper • 2306.13421 • Published Jun 23, 2023 • 16
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation Paper • 2310.03214 • Published Oct 5, 2023 • 20