meta-llama/Llama-4-Scout-17B-16E-Instruct Image-Text-to-Text β’ Updated 7 days ago β’ 657k β’ β’ 780
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy Paper β’ 2503.19757 β’ Published 22 days ago β’ 50
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks Paper β’ 2503.21696 β’ Published 19 days ago β’ 21