Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published 9 days ago • 61
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 17 days ago • 89
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published 28 days ago • 48
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 79
view article Article Accelerating LLM Inference: Fast Sampling with Gumbel-Max Trick By cxdu • Oct 24, 2024 • 10
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models Paper • 2410.13841 • Published Oct 17, 2024 • 16
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm Paper • 2408.08072 • Published Aug 15, 2024 • 35
Nemotron 4 340B Collection Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated 13 days ago • 161
Eurus Collection Advancing LLM Reasoning Generalists with Preference Trees • 11 items • Updated Oct 22, 2024 • 24
Weak-to-Strong Extrapolation Expedites Alignment Collection Better aligned models obtained by weak-to-strong model extrapolation (ExPO) • 25 items • Updated 8 days ago • 17
Qwen1.5 Collection Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated Nov 28, 2024 • 205