Submitted by Xueqing 57 MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation · 44 authors 2
Submitted by nicolaus625 33 CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following · 5 authors 1
Submitted by shun-zheng 24 Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs · 12 authors 3
Submitted by mparvez 24 Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team · 4 authors 1
Submitted by zhangshaolei 21 Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model · 5 authors 1
Submitted by amsabour 13 Align Your Flow: Scaling Continuous-Time Flow Map Distillation · 3 authors 3
Submitted by yilunzhao 13 Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure · 4 authors 1
Submitted by ahmedheakl 10 Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees · 5 authors 1
Submitted by CostaliyA 7 CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios · 9 authors 1
Submitted by Liuff23 6 xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations · 33 authors 1
Submitted by zichenwen 6 EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models · 8 authors 1
Submitted by koustuvs 6 V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning · 30 authors 1
Submitted by Siyuc 5 Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders · 5 authors 1
Submitted by XaiverZ 3 Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning · 3 authors 1
Submitted by cetosignis 2 From Bytes to Ideas: Language Modeling with Autoregressive U-Nets · 6 authors 1
Submitted by akhaliq 2 Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs · 46 authors 1
Submitted by dsouzadaniel 2 Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers · 5 authors 1
Submitted by JJ-TMT 2 CAMS: A CityGPT-Powered Agentic Framework for Urban Human Mobility Simulation · 4 authors 1
Submitted by amanchadha 1 Alignment Quality Index (AQI) : Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations · 15 authors 1
Submitted by BeileiCui 1 TR2M: Transferring Monocular Relative Depth to Metric Depth with Language Descriptions and Scale-Oriented Contrast · 4 authors 1
Submitted by FaiyazAbdullah114708 - VisText-Mosquito: A Multimodal Dataset and Benchmark for AI-Based Mosquito Breeding Site Detection and Reasoning · 7 authors 1
Submitted by MaxDu - DynaGuide: Steering Diffusion Polices with Active Dynamic Guidance · 2 authors 1
Submitted by hsichelin - EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction · 4 authors 1
Submitted by ChetKao - Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM Reasoning · 7 authors 1