Submitted by ytz 43 Black-Box On-Policy Distillation of Large Language Models Microsoft Research 4.2k 3
Submitted by Akshay Nambi 9 Magentic Marketplace: An Open-Source Environment for Studying Agentic Markets Microsoft Research 97 2
Submitted by Li Lyna Zhang 60 LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts Microsoft Research 5
Submitted by Junpeng Liu 27 DocReward: A Document Reward Model for Structuring and Stylizing Microsoft Research 3
Submitted by Martina Vilas 1 Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning Microsoft Research 2
Submitted by Zijian Li 4 PixelCraft: A Multi-Agent System for High-Fidelity Visual Reasoning on Structured Images Microsoft Research 16 2
Submitted by Yifei Shen 9 Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective Microsoft Research 2