Generative Evaluation of Complex Reasoning in Large Language Models Paper • 2504.02810 • Published Apr 3 • 14
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published Feb 13 • 195