metadata
library_name: transformers
license: mit
datasets:
- CraftJarvis/minecraft-motion-coa-dataset
- CraftJarvis/minecraft-grounding-coa-dataset
- CraftJarvis/minecraft-motion-action-dataset
- CraftJarvis/minecraft-grounding-action-dataset
- CraftJarvis/minecraft-text-action-dataset
metrics:
- accuracy
base_model:
- Qwen/Qwen2-VL-7B-Instruct
pipeline_tag: image-text-to-text
arxiv: 2509.13347
Minecraft-Openha-Qwen2vl-7b-2509
✨ Highlights
This model is built on Qwen2-VL-7B-Instruct and introduces two key innovations:
- Chain of Action (CoA): bridges reasoning and control by using abstracted actions as thoughts.
- All-in-One training: unifies motion, grounding, and text actions into a single framework, enabling broad generalization beyond specialist agents.
💻 Usage
You can download and use this model with:
python examples/rollout_openha.py \
--output_mode text_action \
--vlm_client_mode hf \
--system_message_tag text_action \
--model_ips localhost --model_ports 11000 \
--model_id minecraft-openha-qwen2vl-7b-2509 \
--record_path "~/evaluate" \
--max_steps_num 200 \
--num_rollouts 8