Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations Paper • 2506.18898 • Published 3 days ago • 23
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations Paper • 2506.18898 • Published 3 days ago • 23
Chop & Learn: Recognizing and Generating Object-State Compositions Paper • 2309.14339 • Published Sep 25, 2023
NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling Paper • 2212.14593 • Published Dec 30, 2022
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior Paper • 2410.21264 • Published Oct 28, 2024 • 9
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior Paper • 2410.21264 • Published Oct 28, 2024 • 9
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior Paper • 2410.21264 • Published Oct 28, 2024 • 9 • 2