view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 2 days ago • 232
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning Paper • 2502.17407 • Published 18 days ago • 24
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published 22 days ago • 97
Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study Paper • 2502.02481 • Published Feb 4 • 10
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention Aug 21, 2024 • 32
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13 • 92
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them Paper • 2501.08292 • Published Jan 14 • 17
HelpSteer2-Preference: Complementing Ratings with Preferences Paper • 2410.01257 • Published Oct 2, 2024 • 23
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published Jan 2 • 50
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper • 2412.18619 • Published Dec 16, 2024 • 55
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 80