view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control Feb 4 • 142
An Empirical Study of GPT-4o Image Generation Capabilities Paper • 2504.05979 • Published 7 days ago • 60
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published 7 days ago • 141
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 8 days ago • 158
Black Swan (Abductive and Defeasible Reasoning) Collection Data for CVPR 2025 paper, "Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events" • 3 items • Updated 24 days ago • 2
MedSAM2: Segment Anything in 3D Medical Images and Videos Paper • 2504.03600 • Published 11 days ago • 8
MedSAM2 Collection MedSAM2: Segment Anything in 3D Medical Images and Videos • 4 items • Updated 3 days ago • 3
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages Paper • 2503.23542 • Published 16 days ago • 10
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Paper • 2504.02587 • Published 12 days ago • 30
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation Paper • 2504.02782 • Published 12 days ago • 54
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis Paper • 2502.18924 • Published Feb 26 • 12
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting Paper • 2503.17032 • Published 25 days ago • 24
Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation Paper • 2503.13424 • Published 29 days ago • 28