Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning Paper • 2502.06060 • Published 12 days ago • 32
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published Jan 21 • 51
RL + Transformer = A General-Purpose Problem Solver Paper • 2501.14176 • Published 29 days ago • 24
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper • 2501.15570 • Published 27 days ago • 23
Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 26 days ago • 26
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published 25 days ago • 26
Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published 25 days ago • 35
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 25 days ago • 106
Large Language Models Think Too Fast To Explore Effectively Paper • 2501.18009 • Published 23 days ago • 23
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper • 2501.18512 • Published 23 days ago • 27
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper • 2501.18585 • Published 23 days ago • 55
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published 22 days ago • 37
HIGGS Collection Models prequantized with [HIGGS](https://arxiv.org/abs/2411.17525) zero-shot quantization. Requires the latest `transformers` to run. • 17 items • Updated Dec 24, 2024 • 6
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution Paper • 2501.05040 • Published Jan 9 • 15
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper • 2501.05441 • Published Jan 9 • 89