GROOT is a research series investigating how self-supervised and weakly supervised learning can be used to train agents that follow instructions.
AI & ML interests
None defined yet.
Recent Activity
View all activity
Vision-Language-Action Models in Minecraft.
-
CraftJarvis/JarvisVLA-Qwen2-VL-7B
Image-Text-to-Text • 8B • Updated • 151 • 9 -
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse
Paper • 2503.16365 • Published • 41 -
11
Minecraft VLM Leaderboard
🏢View and search Minecraft LLM leaderboard
-
CraftJarvis/minecraft-vla-sft
Viewer • Updated • 3.78M • 224 • 5
ROCKET is the research series that explores vision-based goal specification methods.
-
ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment
Paper • 2503.02505 • Published • 7 -
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting
Paper • 2410.17856 • Published • 52 -
phython96/ROCKET-1
Robotics • 0.1B • Updated • 20 • 5 -
phython96/ROCKET-2-1x-22w
0.2B • Updated • 51
GROOT is a research series investigating how self-supervised and weakly supervised learning can be used to train agents that follow instructions.
ROCKET is the research series that explores vision-based goal specification methods.
-
ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment
Paper • 2503.02505 • Published • 7 -
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting
Paper • 2410.17856 • Published • 52 -
phython96/ROCKET-1
Robotics • 0.1B • Updated • 20 • 5 -
phython96/ROCKET-2-1x-22w
0.2B • Updated • 51
Vision-Language-Action Models in Minecraft.
-
CraftJarvis/JarvisVLA-Qwen2-VL-7B
Image-Text-to-Text • 8B • Updated • 151 • 9 -
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse
Paper • 2503.16365 • Published • 41 -
11
Minecraft VLM Leaderboard
🏢View and search Minecraft LLM leaderboard
-
CraftJarvis/minecraft-vla-sft
Viewer • Updated • 3.78M • 224 • 5