Reward Models Collection Nemotron reward models. For use in RLHF pipelines and LLM-as-a-Judge • 8 items • Updated 1 day ago • 10
ERNIE 4.5 Collection collection of ERNIE 4.5 models. "-Paddle" models use PaddlePaddle weights, while "-PT" models use Transformer-style PyTorch weights. • 23 items • Updated 1 day ago • 132
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published 8 days ago • 55
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Paper • 2506.18841 • Published 11 days ago • 54
AceReason Collection Math and Code reasoning model trained through reinforcement learning (RL) • 7 items • Updated 1 day ago • 13
MiniCPM4 Collection MiniCPM4: Ultra-Efficient LLMs on End Devices • 22 items • Updated 12 days ago • 66
CodeI/O Collection Collection for CodeI/O @ https://codei-o.github.io/ • 16 items • Updated May 6 • 7
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated 9 days ago • 62
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Apr 28 • 498