R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model Paper • 2503.05132 • Published Mar 7 • 55
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts Paper • 2502.20395 • Published Feb 27 • 47
Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness Paper • 2406.16342 • Published Jun 24, 2024
BenTo: Benchmark Task Reduction with In-Context Transferability Paper • 2410.13804 • Published Oct 17, 2024 • 20