view article Article SigLIP 2: A better multilingual vision language encoder By ariG23498 and 2 others • Feb 21 • 181
Searching for Better ViT Baselines Collection Exploring ViT hparams and model shapes for the GPU poor (between tiny and base). • 28 items • Updated Aug 1 • 18
Fashion-VDM: Video Diffusion Model for Virtual Try-On Paper • 2411.00225 • Published Oct 31, 2024 • 11