Submitted by taesiri 70 DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization · 10 authors 2
Submitted by ElsaShaw 55 From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models · 11 authors 2
Submitted by liujiashuo77 53 FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction · 30 authors 3
Submitted by ZhaoyangLyu 37 MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds · 12 authors 133 2
Submitted by Canyu 32 Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization · 6 authors 70 2
Submitted by Wyattz23 25 From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery · 22 authors 2
Submitted by Ziyang 24 MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers · 10 authors 59 2
Submitted by taesiri 24 NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model · 211 authors 3
Submitted by Felix1023 19 Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs · 9 authors 2
Submitted by xiaoniqiu 6 On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting · 8 authors 5
Submitted by anvo25 5 ViExam: Are Vision Language Models Better than Humans on Vietnamese Multimodal Exam Questions? · 5 authors 2 2
Submitted by jnanliu 2 Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis · 7 authors 1
Submitted by ashiq24 2 Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer · 7 authors 1 2
Submitted by woutLegiest 1 Leuvenshtein: Efficient FHE-based Edit Distance Computation with Single Bootstrap per Cell · 5 authors 2 2
Submitted by Franck-Dernoncourt 1 mSCoRe: a Multilingual and Scalable Benchmark for Skill-based Commonsense Reasoning · 3 authors 2
Submitted by MrShouxingMa - Refining Contrastive Learning and Homography Relations for Multi-Modal Recommendation · 4 authors 2