moonshotai/Kimi-VL-A3B-Thinking-2506 Image-Text-to-Text โข 16B โข Updated 7 days ago โข 26.7k โข 167
view post Post 1412 Yesterday was the day of vision language action models (VLAs)!> SmolVLA: open-source small VLA for robotics by Hugging Face LeRobot team ๐คBlog: https://huggingface.co/blog/smolvlaModel: lerobot/smolvla_base> Holo-1: 3B & 7B web/computer use agentic VLAs by H Company ๐ป Model family: Hcompany/holo1-683dd1eece7eb077b96d0cbd Demo: https://huggingface.co/spaces/multimodalart/Holo1Blog: https://huggingface.co/blog/Hcompany/holo1super exciting times!! See translation ๐ 5 5 + Reply
ARIA: Training Language Agents with Intention-Driven Reward Aggregation Paper โข 2506.00539 โข Published May 31 โข 30
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning? Paper โข 2505.23359 โข Published May 29 โข 40
Table-R1: Inference-Time Scaling for Table Reasoning Paper โข 2505.23621 โข Published May 29 โข 92