LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper β’ 2408.10188 β’ Published Aug 19, 2024 β’ 51
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance Paper β’ 2408.08189 β’ Published Aug 15, 2024 β’ 17
MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation Paper β’ 2407.15060 β’ Published Jul 21, 2024 β’ 9
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information Paper β’ 2402.13616 β’ Published Feb 21, 2024 β’ 47
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper β’ 2402.13753 β’ Published Feb 21, 2024 β’ 115
MusicRL: Aligning Music Generation to Human Preferences Paper β’ 2402.04229 β’ Published Feb 6, 2024 β’ 17
EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision Paper β’ 2311.02077 β’ Published Nov 3, 2023 β’ 14
Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects Paper β’ 2211.02247 β’ Published Nov 4, 2022 β’ 3
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models Paper β’ 2310.08491 β’ Published Oct 12, 2023 β’ 53
How FaR Are Large Language Models From Agents with Theory-of-Mind? Paper β’ 2310.03051 β’ Published Oct 4, 2023 β’ 34
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation Paper β’ 2309.16429 β’ Published Sep 28, 2023 β’ 11
Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition Paper β’ 2309.15223 β’ Published Sep 26, 2023 β’ 19
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models Paper β’ 2309.15103 β’ Published Sep 26, 2023 β’ 42
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer Paper β’ 2308.06873 β’ Published Aug 14, 2023 β’ 25
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory Paper β’ 2308.08089 β’ Published Aug 16, 2023 β’ 21