SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published 7 days ago • 78
InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction Paper • 2505.10887 • Published 24 days ago • 10
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI Paper • 2505.19443 • Published 14 days ago • 15
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning Paper • 2505.15966 • Published 18 days ago • 51
OpenCodeReasoning-II Collection Reasoning data for supervised finetuning of LLMs to advance code generation and critique • 5 items • Updated 3 days ago • 8
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published 25 days ago • 93
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency Paper • 2504.18589 • Published Apr 24 • 11
HyperCLOVA X SEED Collection HyperCLOVA X SEED is NAVER's lightweight open-source lineup with a strong focus on Korean language performance • 3 items • Updated Apr 24 • 26
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published Apr 21 • 65
MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space Paper • 2504.13835 • Published Apr 18 • 38
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning Paper • 2504.08837 • Published Apr 10 • 43
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 269
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published Apr 10 • 28
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought Paper • 2504.05599 • Published Apr 8 • 83
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) Paper • 2504.03151 • Published Apr 4 • 14