What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models Paper • 2503.24235 • Published Mar 31 • 54
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models Paper • 2504.13367 • Published Apr 17 • 24
RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios Paper • 2412.08972 • Published Dec 12, 2024 • 10
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions Paper • 2412.08737 • Published Dec 11, 2024 • 54
Game-theoretic LLM: Agent Workflow for Negotiation Games Paper • 2411.05990 • Published Nov 8, 2024 • 8
How to Index Item IDs for Recommendation Foundation Models Paper • 2305.06569 • Published May 11, 2023 • 1
The Impact of Reasoning Step Length on Large Language Models Paper • 2401.04925 • Published Jan 10, 2024 • 18