When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research Paper • 2505.11855 • Published May 17 • 9
view article Article Manus AI: The Best Autonomous AI Agent Redefining Automation and Productivity By LLMhacker • Mar 6 • 172
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning Paper • 2502.17407 • Published Feb 24 • 26
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Paper • 2502.14768 • Published Feb 20 • 48
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13 • 99
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8 • 280
view article Article Navigating Korean LLM Research #2: Evaluation Tools By amphora • Oct 23, 2024 • 8
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 139
view article Article Llama-3.1-Storm-8B: Improved SLM with Self-Curation + Model Merging By akjindal53244 • Aug 19, 2024 • 77
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12, 2024 • 126
DDK: Distilling Domain Knowledge for Efficient Large Language Models Paper • 2407.16154 • Published Jul 23, 2024 • 22