mk's picture

3

mk

mk1111

AI & ML interests

None yet

Recent Activity

upvoted a paper 8 days ago

TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization

published a dataset 20 days ago

mk1111/llama3-8b-instruct-ultrafeedback

published a dataset 20 days ago

mk1111/llama3-8b-instruct-ultrafeedback-armorm

View all activity

Organizations

None yet

upvoted a paper 8 days ago

TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization

Paper • 2506.14574 • Published Jun 17 • 1

upvoted a paper 25 days ago

Scaling RL to Long Videos

Paper • 2507.07966 • Published 25 days ago • 151

upvoted a paper about 1 month ago

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

Paper • 2506.22434 • Published Jun 27 • 10