SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published 1 day ago • 78
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs Paper • 2502.12982 • Published 3 days ago • 9
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 9 days ago • 139
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2 Paper • 2502.03544 • Published 16 days ago • 42
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 12 items • Updated 1 day ago • 77
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper • 2501.18585 • Published 22 days ago • 55
Qwen2.5-1M Collection The long-context version of Qwen2.5, supporting 1M-token context lengths • 2 items • Updated 26 days ago • 100
Kimi k1.5: Scaling Reinforcement Learning with LLMs Paper • 2501.12599 • Published about 1 month ago • 96
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published about 1 month ago • 327