admarcosai 's Collections LMMM
updated
OneLLM: One Framework to Align All Modalities with Language
Paper
• 2312.03700
• Published
• 24
Direct-a-Video: Customized Video Generation with User-Directed Camera
Movement and Object Motion
Paper
• 2402.03162
• Published
• 19
Paper
• 2402.09470
• Published
• 13
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper
• 2402.12226
• Published
• 45
Thinking in Space: How Multimodal Large Language Models See, Remember,
and Recall Spaces
Paper
• 2412.14171
• Published
• 24
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for
Long-term Streaming Video and Audio Interactions
Paper
• 2412.09596
• Published
• 97
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity
Visual Descriptions
Paper
• 2412.08737
• Published
• 54
Multimodal Latent Language Modeling with Next-Token Diffusion
Paper
• 2412.08635
• Published
• 49