Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published 10 days ago • 61
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 17 days ago • 89
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models Paper • 2412.10117 • Published Dec 13, 2024 • 3
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 79
A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models Paper • 2411.19477 • Published Nov 29, 2024 • 6
Aligning Large Language Models via Self-Steering Optimization Paper • 2410.17131 • Published Oct 22, 2024 • 22
ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer Paper • 2410.00086 • Published Sep 30, 2024 • 11
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18, 2024 • 76
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding Paper • 2409.03420 • Published Sep 5, 2024 • 26
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Paper • 2408.04840 • Published Aug 9, 2024 • 34
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities Paper • 2308.12966 • Published Aug 24, 2023 • 8
Very Large-Scale Multi-Agent Simulation in AgentScope Paper • 2407.17789 • Published Jul 25, 2024 • 32
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models Paper • 2406.13542 • Published Jun 19, 2024 • 16
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding Paper • 2403.12895 • Published Mar 19, 2024 • 32