Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Paper • 2412.15322 • Published Dec 19, 2024 • 19
Holo1 Collection Vision-Language Action Model for use in Surfer-H web navigation agent • 6 items • Updated 8 days ago • 45
Style Customization of Text-to-Vector Generation with Image Diffusion Priors Paper • 2505.10558 • Published May 15 • 15
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation Paper • 2505.04512 • Published May 7 • 35
FaceID-6M: A Large-Scale, Open-Source FaceID Customization Dataset Paper • 2503.07091 • Published Mar 10 • 3
h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-Transform Paper • 2503.02187 • Published Mar 4 • 5
Wan2.1 14B 480p I2V LoRAs Collection A collection of Remade's Wan2.1 14B 480p I2V LoRAs • 49 items • Updated 25 days ago • 175
Remote VAE Inference Endpoints Collection Models and handler code used in https://huggingface.co/blog/remote_vae • 5 items • Updated Mar 10 • 6
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion Paper • 2503.01183 • Published Mar 3 • 28
view article Article FastRTC: The Real-Time Communication Library for Python By freddyaboulton and 1 other • Feb 25 • 166
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Paper • 2502.14397 • Published Feb 20 • 42
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Paper • 2502.13128 • Published Feb 18 • 42