ganqu (Ganqu Cui)

authored 3 papers 6 months ago

V-GameGym: Visual Game Generation for Code Large Language Models

Paper • 2509.20136 • Published Sep 24, 2025 • 9

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

Paper • 2509.18154 • Published Sep 16, 2025 • 56

FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18, 2025 • 117

authored a paper 10 months ago

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published May 28, 2025 • 132

authored 2 papers 11 months ago

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22, 2025 • 120

Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21, 2025 • 88

authored a paper 12 months ago

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

Paper • 2503.21614 • Published Mar 27, 2025 • 43

authored 2 papers about 1 year ago

UltraIF: Advancing Instruction Following from the Wild

Paper • 2502.04153 • Published Feb 6, 2025 • 24

Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published Feb 3, 2025 • 62

authored 3 papers over 1 year ago

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

Paper • 2405.17220 • Published May 27, 2024 • 1

UltraMedical: Building Specialized Generalists in Biomedicine

Paper • 2406.03949 • Published Jun 6, 2024

Free Process Rewards without Process Labels

Paper • 2412.01981 • Published Dec 2, 2024 • 34

authored 6 papers almost 2 years ago

Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models

Paper • 2403.08281 • Published Mar 13, 2024

Advancing LLM Reasoning Generalists with Preference Trees

Paper • 2404.02078 • Published Apr 2, 2024 • 46

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Paper • 2404.06395 • Published Apr 9, 2024 • 24

authored a paper over 2 years ago

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Paper • 2312.00849 • Published Dec 1, 2023 • 12

Ganqu Cui

AI & ML interests

Organizations