--- pretty_name: OWMM-Agent-Model arxiv: 2506.04217 language: - en license: mit tags: - robotics - multimodal - open world mobile manipulation - agent base_model: - OpenGVLab/InternVL2_5-8B - OpenGVLab/InternVL2_5-38B size_categories: - 10B ![OWMM-Agent Banner](docs/demo_banner.gif) ## 📖 Project Overview The following repositories contain the implementation and reproduction of the method described in the paper “[OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis](https://arxiv.org/abs/2506.04217)”. - **Paper**: [arXiv:2506.04217](https://arxiv.org/abs/2506.04217) - **Model**: [`OWMM-Agent-Model`](https://huggingface.co/hhyrhy/OWMM-Agent-Model) — **current repo**, the Models we trained and used in OWMM tasks(both simulator and real world). - **Dataset**: [`OWMM-Agent-data`](https://huggingface.co/datasets/hhyrhy/OWMM-Agent-data) — the training dataset of our OWMM Models. - **GitHub**: [`OWMM-Agent-codebase`](https://github.com/HHYHRHY/OWMM-Agent) — the codebase of OWMM-Agent, including scripts for data collection and annotation in the simulator, as well as implementations for both step and episodic evaluations.