2 3

huangrh9

huangrh99

AI & ML interests

None yet

Recent Activity

authored a paper 6 days ago

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

authored a paper 6 days ago

Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?

authored a paper 6 days ago

Efficient Multi-modal Large Language Models via Visual Token Grouping

View all activity

Organizations

huangrh9's activity

authored 8 papers 6 days ago

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

Paper • 2504.01934 • Published 7 days ago • 20

Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?

Paper • 2503.06252 • Published Mar 8

Efficient Multi-modal Large Language Models via Visual Token Grouping

Paper • 2411.17773 • Published Nov 26, 2024

AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning

Paper • 2411.11930 • Published Nov 18, 2024

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

Paper • 2407.08706 • Published Jul 11, 2024

GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training

Paper • 2308.11331 • Published Aug 22, 2023

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

Paper • 2308.09306 • Published Aug 18, 2023 • 1

FILIP: Fine-grained Interactive Language-Image Pre-Training

Paper • 2111.07783 • Published Nov 9, 2021

commented a paper 6 days ago

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

Paper • 2504.01934 • Published 7 days ago • 20 •

upvoted a paper 6 days ago

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

Paper • 2504.01934 • Published 7 days ago • 20

commented a paper 6 days ago

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

Paper • 2504.01934 • Published 7 days ago • 20 •

authored a paper about 2 months ago

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

Paper • 2412.06673 • Published Dec 9, 2024 • 11

upvoted a paper 4 months ago

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

Paper • 2412.06673 • Published Dec 9, 2024 • 11

authored a paper 6 months ago

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Paper • 2409.18042 • Published Sep 26, 2024 • 41

upvoted a paper 6 months ago

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Paper • 2409.18042 • Published Sep 26, 2024 • 41

updated a dataset 9 months ago

huangrh9/MLCID

Viewer • Updated Jul 18, 2024 • 666k • 46