Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Paper • 2505.24726 • Published May 30 • 270 • 8
BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training Paper • 2409.04599 • Published Sep 6, 2024 • 2 • 2
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8 • 283 • 43
Pychop: Emulating Low-Precision Arithmetic in Numerical Methods and Neural Networks Paper • 2504.07835 • Published Apr 10 • 2
Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities Paper • 2505.01043 • Published May 2 • 10 • 3