OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models Paper β’ 2502.01061 β’ Published 19 days ago β’ 180
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper β’ 2411.14402 β’ Published Nov 21, 2024 β’ 43
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper β’ 2411.10440 β’ Published Nov 15, 2024 β’ 114
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant Paper β’ 2410.18603 β’ Published Oct 24, 2024 β’ 32
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Paper β’ 2410.06885 β’ Published Oct 9, 2024 β’ 43
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper β’ 2409.17481 β’ Published Sep 26, 2024 β’ 47
Emu3 Collection Emu3: Next-Token Prediction is All You Need β’ 7 items β’ Updated 9 days ago β’ 69
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Paper β’ 2409.18125 β’ Published Sep 26, 2024 β’ 34
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper β’ 2409.02634 β’ Published Sep 4, 2024 β’ 93
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference Paper β’ 2407.14057 β’ Published Jul 19, 2024 β’ 45