MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment Paper β’ 2512.09636 β’ Published Dec 10, 2025 β’ 26
DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation Paper β’ 2510.09116 β’ Published Oct 10, 2025 β’ 97
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Paper β’ 2510.12276 β’ Published Oct 14, 2025 β’ 149
DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation Paper β’ 2510.09116 β’ Published Oct 10, 2025 β’ 97 β’ 2
DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation Paper β’ 2510.09116 β’ Published Oct 10, 2025 β’ 97
From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models Paper β’ 2508.13491 β’ Published Aug 19, 2025 β’ 59
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation Paper β’ 2506.14028 β’ Published Jun 16, 2025 β’ 94
FinAudio: A Benchmark for Audio Large Language Models in Financial Applications Paper β’ 2503.20990 β’ Published Mar 26, 2025 β’ 19