ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation Paper • 2506.18095 • Published Jun 22 • 65
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play Paper • 2505.02707 • Published May 5 • 86
LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects Paper • 2504.19838 • Published Apr 28 • 22
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception Paper • 2312.07472 • Published Dec 12, 2023 • 2
SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection Paper • 2309.07084 • Published Sep 13, 2023 • 1
MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control Paper • 2403.12037 • Published Mar 18, 2024 • 1
WorldSimBench: Towards Video Generation Models as World Simulators Paper • 2410.18072 • Published Oct 23, 2024 • 20
GameFactory: Creating New Games with Generative Interactive Videos Paper • 2501.08325 • Published Jan 14 • 68
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation Paper • 2501.12612 • Published Jan 22
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints Paper • 2503.16408 • Published Mar 20 • 41
Elucidating The Design Space of Classifier-Guided Diffusion Generation Paper • 2310.11311 • Published Oct 17, 2023
Explore and Exploit the Diverse Knowledge in Model Zoo for Domain Generalization Paper • 2306.02595 • Published Jun 5, 2023
On the Expressive Power of a Variant of the Looped Transformer Paper • 2402.13572 • Published Feb 21, 2024
Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation Paper • 2405.15302 • Published May 24, 2024
Elucidating the design space of language models for image generation Paper • 2410.16257 • Published Oct 21, 2024
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation Paper • 2503.13070 • Published Mar 17 • 10
Learning Few-Step Diffusion Models by Trajectory Distribution Matching Paper • 2503.06674 • Published Mar 9 • 8
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation Paper • 2501.15907 • Published Jan 27 • 17
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation Paper • 2407.05361 • Published Jul 7, 2024 • 2