Dataset and Models for ''IFDECORATOR: Wrapping Instruction Following Reinforcement Learning with Verifiable Rewards''
guox18
guox18
·
AI & ML interests
Alignment
Recent Activity
upvoted
a
paper
about 1 month ago
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
Text-to-Image Reinforcement Learning
authored
a paper
about 2 months ago
Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and
Self-Improving OCR
authored
a paper
about 2 months ago
IFDECORATOR: Wrapping Instruction Following Reinforcement Learning with
Verifiable Rewards
Organizations
None yet