Submitted by Juanxi 69 MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization · 11 authors 7
Submitted by akhaliq 51 DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance · 6 authors 5
Submitted by Howe666 50 AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction · 5 authors 2
Submitted by wenhu 33 ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations · 10 authors 2
Submitted by 8ruceLi 31 Towards Physically Plausible Video Generation via VLM Planning · 11 authors 3
Submitted by hanyang-21 29 VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step · 4 authors 2
Submitted by huangrh9 18 ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement · 11 authors 4
Submitted by akhaliq 17 Articulated Kinematics Distillation from Video Diffusion Models · 7 authors 3
Submitted by AdinaY 15 Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback · 3 authors 3
Submitted by Jarvis1111 12 Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks · 7 authors 2
Submitted by YanNeu 11 DASH: Detection and Assessment of Systematic Hallucinations of VLMs · 3 authors 2
Submitted by hychiang 6 Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models · 6 authors 2
Submitted by Jiuzhouh 4 VerifiAgent: a Unified Verification Agent in Language Model Reasoning · 3 authors 2
Submitted by nielsr 4 MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis · 14 authors 2
Submitted by mawjdgus 2 Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations · 2 authors 1