view article Article SigLIP 2: A better multilingual vision language encoder about 18 hours ago β’ 47
Gated Linear Attention Transformers with Hardware-Efficient Training Paper β’ 2312.06635 β’ Published Dec 11, 2023 β’ 7
π«π· Calme-3 Collection Here you can find all the new Calme-3 models β’ 27 items β’ Updated 12 days ago β’ 13
view article Article Open Preference Dataset for Text-to-Image Generation by the π€ Community Dec 9, 2024 β’ 54
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding Paper β’ 2401.04575 β’ Published Jan 9, 2024 β’ 17
VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation Paper β’ 2502.07531 β’ Published 10 days ago β’ 13
Generating Multi-Image Synthetic Data for Text-to-Image Customization Paper β’ 2502.01720 β’ Published 18 days ago β’ 6
Terminus XL Collection v-prediction SDXL clone with zero-terminal SNR noise schedule β’ 8 items β’ Updated Apr 24, 2024 β’ 7
AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. β’ 19 items β’ Updated Nov 22, 2024 β’ 73
Ultravox v0.5 Collection Ultravox is a multimodal Speech LLM built around different pretrained LLMs (frozen) and the whisper-large-v3-turbo (fine-tuned) backbone. β’ 3 items β’ Updated 11 days ago β’ 5
R3GAN Collection R3GAN: A Modern BaselineGAN https://github.com/brownvc/R3GAN/ https://arxiv.org/abs/2501.05441 β’ 7 items β’ Updated Jan 10 β’ 10
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper β’ 2501.05441 β’ Published Jan 9 β’ 89
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features Paper β’ 2502.04320 β’ Published 15 days ago β’ 33
Material Anything: Generating Materials for Any 3D Object via Diffusion Paper β’ 2411.15138 β’ Published Nov 22, 2024 β’ 44
Goku: Flow Based Video Generative Foundation Models Paper β’ 2502.04896 β’ Published 14 days ago β’ 86
view article Article Ο0 and Ο0-FAST: Vision-Language-Action Models for General Robot Control 18 days ago β’ 106
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up Paper β’ 2412.16112 β’ Published Dec 20, 2024 β’ 22
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion Paper β’ 2402.03162 β’ Published Feb 5, 2024 β’ 19
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling Paper β’ 2401.15977 β’ Published Jan 29, 2024 β’ 38