ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement Paper • 2504.01934 • Published 7 days ago • 20
Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models? Paper • 2503.06252 • Published Mar 8
Efficient Multi-modal Large Language Models via Visual Token Grouping Paper • 2411.17773 • Published Nov 26, 2024
AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning Paper • 2411.11930 • Published Nov 18, 2024
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models Paper • 2407.08706 • Published Jul 11, 2024
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training Paper • 2308.11331 • Published Aug 22, 2023
DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability Paper • 2308.09306 • Published Aug 18, 2023 • 1
FILIP: Fine-grained Interactive Language-Image Pre-Training Paper • 2111.07783 • Published Nov 9, 2021
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement Paper • 2504.01934 • Published 7 days ago • 20 • 4
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement Paper • 2504.01934 • Published 7 days ago • 20
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement Paper • 2504.01934 • Published 7 days ago • 20 • 4
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance Paper • 2412.06673 • Published Dec 9, 2024 • 11
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance Paper • 2412.06673 • Published Dec 9, 2024 • 11
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published Sep 26, 2024 • 41
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published Sep 26, 2024 • 41