PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models Paper • 2401.05252 • Published Jan 10 • 47
Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability Paper • 2402.12225 • Published Feb 19 • 6
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages Paper • 2406.10118 • Published Jun 14 • 30
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents Paper • 2410.03450 • Published Oct 4 • 35
Lemur: Harmonizing Natural Language and Code for Language Agents Paper • 2310.06830 • Published Oct 10, 2023 • 31
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11 • 46