TDRM: Smooth Reward Models with Temporal Difference for LLM RL and Inference Paper • 2509.15110 • Published Sep 18 • 1
TDRM Collection Learning Smooth Reward Models with Temporal Difference for LLM RL and Inference • 15 items • Updated Nov 12 • 2
GLM-4.5 Collection GLM-4.5: An open-source large language model designed for intelligent agents by Z.ai • 11 items • Updated Aug 11 • 250
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents Paper • 2410.24024 • Published Oct 31, 2024 • 49