JARVIS-VLA-v1
Collection
Vision-Language-Action Models in Minecraft.
•
4 items
•
Updated
•
9
JarvisVLA-Qwen2-VL-7B is a Visual-Language-Action (VLA) model specifically tailored for the open-world game Minecraft. Based on human language instructions, JarvisVLA-Qwen2-VL-7B masters thousands of in-game skills, empowering endless creativity and interaction in Minecraft’s expansive universe!
@article{li2025jarvisvla,
title = {JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse},
author = {Muyao Li and Zihao Wang and Kaichen He and Xiaojian Ma and Yitao Liang},
year = {2025}
}
Base model
Qwen/Qwen2-VL-7B