TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization Paper • 2506.14574 • Published Jun 17 • 1
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning Paper • 2506.22434 • Published Jun 27 • 10