Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Paipile 's Collections
RFT

RFT

updated 1 day ago
Upvote
-

  • Group Sequence Policy Optimization

    Paper • 2507.18071 • Published 9 days ago • 257

  • LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

    Paper • 2507.15758 • Published 11 days ago • 34

  • Hierarchical Budget Policy Optimization for Adaptive Reasoning

    Paper • 2507.15844 • Published 11 days ago • 16

  • Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning

    Paper • 2507.16814 • Published 10 days ago • 22

  • RePO: Replay-Enhanced Policy Optimization

    Paper • 2506.09340 • Published Jun 11

  • Perception-Aware Policy Optimization for Multimodal Reasoning

    Paper • 2507.06448 • Published 24 days ago • 44

  • On-Policy RL with Optimal Reward Baseline

    Paper • 2505.23585 • Published May 29 • 15

  • EXPO: Stable Reinforcement Learning with Expressive Policies

    Paper • 2507.07986 • Published 22 days ago
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs