Rethinking Reward Models for Multi-Domain Test-Time Scaling Paper • 2510.00492 • Published 2 days ago • 23 • 2
BroRL: Scaling Reinforcement Learning via Broadened Exploration Paper • 2510.01180 • Published 1 day ago • 10 • 2
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing Paper • 2509.26346 • Published 3 days ago • 12 • 3
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution Paper • 2509.25301 • Published 4 days ago • 12 • 2
In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning Paper • 2510.00777 • Published 2 days ago • 2 • 1
Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum Paper • 2510.00526 • Published 2 days ago • 7 • 2
Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models Paper • 2509.25162 • Published 4 days ago • 2
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights Paper • 2509.22944 • Published 6 days ago • 29 • 3
TGPO: Temporal Grounded Policy Optimization for Signal Temporal Logic Tasks Paper • 2510.00225 • Published 3 days ago • 1 • 2
Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned Paper • 2509.23250 • Published 6 days ago • 5 • 2
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published 4 days ago • 106 • 2
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources Paper • 2509.25531 • Published 3 days ago • 4 • 3
Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls Paper • 2510.00184 • Published 3 days ago • 13 • 3
Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures Paper • 2509.25045 • Published 4 days ago • 2 • 2
PIPer: On-Device Environment Setup via Online Reinforcement Learning Paper • 2509.25455 • Published 4 days ago • 26 • 2