-
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 94 -
Temporal Preference Optimization for Long-Form Video Understanding
Paper • 2501.13919 • Published • 22 -
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning
Paper • 2503.07365 • Published • 62
Zhang Yuanhan
ZhangYuanhan
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 1 month ago
InternVL3: Exploring Advanced Training and Test-Time Recipes for
Open-Source Multimodal Models
upvoted
a
paper
2 months ago
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic
Faithfulness
updated
a collection
3 months ago
LMM RL