AudioX: Diffusion Transformer for Anything-to-Audio Generation Paper • 2503.10522 • Published Mar 13 • 27
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others • Jan 28 • 878
Executable Code Actions Elicit Better LLM Agents Paper • 2402.01030 • Published Feb 1, 2024 • 164
Search-o1: Agentic Search-Enhanced Large Reasoning Models Paper • 2501.05366 • Published Jan 9 • 101
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs Paper • 2311.09257 • Published Nov 14, 2023 • 48
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens Paper • 2410.13863 • Published Oct 17, 2024 • 39
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published Sep 4, 2024 • 98
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time Paper • 2404.10667 • Published Apr 16, 2024 • 20
DFN Models + Data Collection CLIP Models trained using DFN-2B/DFN-5B datasets • 7 items • Updated Oct 4, 2024 • 15
MobileCLIP Models + DataCompDR Data Collection MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities. DataCompDR: Improved datasets for training image-text SOTA models. • 22 items • Updated Oct 4, 2024 • 30
Rethinking FID: Towards a Better Evaluation Metric for Image Generation Paper • 2401.09603 • Published Nov 30, 2023 • 18