ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19 • 16
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation Paper • 2504.15254 • Published Apr 21 • 6
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18, 2024 • 40