view article Article TimeScope: How Long Can Your Video Large Multimodal Model Go? By orrzohar and 3 others • 3 days ago • 26
view article Article Fast LoRA inference for Flux with Diffusers and PEFT By sayakpaul and 1 other • 3 days ago • 17
view article Article Arc Virtual Cell Challenge: A Primer By FL33TW00D-HF and 1 other • 8 days ago • 39
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • 18 days ago • 578
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Paper • 2506.17218 • Published Jun 20 • 27
SmolVLA Collection Small, efficient and light-weight VLAs pretrained on community datasets • 1 item • Updated Jun 1 • 27
view article Article Weekly Robotics June #1 - SmolVLA discovery and thoughts By Beegbrain • Jun 3 • 9
view article Article Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H By Hcompany and 1 other • Jun 3 • 70
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data By danaaubakirova and 8 others • Jun 3 • 208
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 117
view article Article CodeAgents + Structure: A Better Way to Execute Actions By akseljoonas and 1 other • May 28 • 70
view article Article Exploring Quantization Backends in Diffusers By derekl35 and 2 others • May 21 • 39
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch By ariG23498 and 6 others • May 21 • 193
view article Article Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance By tiiuae and 5 others • May 21 • 28
view article Article Highlights from the First ICLR 2025 Watermarking Workshop By hadyelsahar and 4 others • May 14 • 12