Language Model Self-improvement by Reinforcement Learning Contemplation Paper • 2305.14483 • Published May 23, 2023
Sim2Rec: A Simulator-based Decision-making Approach to Optimize Real-World Long-term User Engagement in Sequential Recommender Systems Paper • 2305.04832 • Published May 3, 2023
Offline Reinforcement Learning with Causal Structured World Models Paper • 2206.01474 • Published Jun 3, 2022
Ovis: Structural Embedding Alignment for Multimodal Large Language Model Paper • 2405.20797 • Published May 31, 2024 • 29