JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization Paper β’ 2503.23377 β’ Published 8 days ago β’ 44
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization Paper β’ 2410.06244 β’ Published Oct 8, 2024 β’ 19
dandelin/vilt-b32-finetuned-vqa Visual Question Answering β’ Updated Aug 2, 2022 β’ 1.07M β’ β’ 407
openai/clip-vit-large-patch14 Zero-Shot Image Classification β’ Updated Sep 15, 2023 β’ 53M β’ β’ 1.69k