limuyu011's picture
Update README.md
e27c299 verified
metadata
library_name: transformers
license: mit
datasets:
  - CraftJarvis/minecraft-motion-coa-dataset
  - CraftJarvis/minecraft-grounding-coa-dataset
  - CraftJarvis/minecraft-motion-action-dataset
  - CraftJarvis/minecraft-grounding-action-dataset
  - CraftJarvis/minecraft-text-action-dataset
metrics:
  - accuracy
base_model:
  - Qwen/Qwen2-VL-7B-Instruct
pipeline_tag: image-text-to-text
arxiv: 2509.13347

Minecraft-Openha-Qwen2vl-7b-2509

✨ Highlights

This model is built on Qwen2-VL-7B-Instruct and introduces two key innovations:

  • Chain of Action (CoA): bridges reasoning and control by using abstracted actions as thoughts.
  • All-in-One training: unifies motion, grounding, and text actions into a single framework, enabling broad generalization beyond specialist agents.

💻 Usage

You can download and use this model with:

python examples/rollout_openha.py \
    --output_mode text_action  \
    --vlm_client_mode hf \
    --system_message_tag text_action \
    --model_ips localhost --model_ports 11000 \
    --model_id minecraft-openha-qwen2vl-7b-2509 \
    --record_path "~/evaluate" \
    --max_steps_num 200 \
    --num_rollouts 8