SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 108 • 23
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data By danaaubakirova and 8 others • Jun 3 • 175
Running 150 150 MedGemma - Radiology Explainer Demo 🩺 Radiology Image & Report Explainer Demo. Built with MedGemma
Running 2.75k 2.75k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
view article Article Scaling robotics datasets with video encoding By aliberts and 2 others • Aug 27, 2024 • 40
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20, 2024 • 62