File size: 5,893 Bytes
4bd3201 08ddfb1 4bd3201 08ddfb1 4bd3201 08ddfb1 4bd3201 08ddfb1 4bd3201 08ddfb1 4bd3201 08ddfb1 4bd3201 08ddfb1 4bd3201 08ddfb1 4bd3201 b12c5b9 4bd3201 b12c5b9 4bd3201 b12c5b9 4bd3201 b12c5b9 4bd3201 b12c5b9 4bd3201 b12c5b9 4bd3201 b12c5b9 4bd3201 b12c5b9 4bd3201 b12c5b9 4bd3201 08ddfb1 4bd3201 08ddfb1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
---
license: mit
tags:
- RLinf
language:
- en
metrics:
- accuracy
base_model:
- gen-robot/openvla-7b-rlvla-warmup
pipeline_tag: reinforcement-learning
model-index:
- name: RLinf-openvla-maniskill3-ppo
results:
- task:
type: VLA
dataset:
type: maniskill-train
name: maniskill-train
metrics:
- type: accuracy
value: 96.09
- task:
type: VLA
dataset:
type: maniskill-vision
name: maniskill-vision
metrics:
- type: accuracy
value: 82.03
- task:
type: VLA
dataset:
type: maniskill-semantic
name: maniskill-semantic
metrics:
- type: accuracy
value: 78.35
- task:
type: VLA
dataset:
type: maniskill-position
name: maniskill-position
metrics:
- type: accuracy
value: 85.42
---
<div align="center">
<img src="logo.svg" alt="RLinf-logo" width="500"/>
</div>
<div align="center">
<!-- <a href="TODO"><img src="https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv"></a> -->
<!-- <a href="TODO"><img src="https://img.shields.io/badge/HuggingFace-yellow?logo=huggingface&logoColor=white" alt="Hugging Face"></a> -->
<a href="https://github.com/RLinf/RLinf"><img src="https://img.shields.io/badge/Github-blue"></a>
<a href="https://rlinf.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/Documentation-Purple?color=8A2BE2&logo=readthedocs"></a>
<!-- <a href="TODO"><img src="https://devin.ai/assets/deepwiki-badge.png" alt="Ask DeepWiki.com" style="height:20px;"></a>
<a href="TODO"><img src="https://img.shields.io/badge/微信-green?logo=wechat&"></a> -->
</div>
<h1 align="center">RLinf: Reinforcement Learning Infrastructure for Agentic AI</h1>
[RLinf](https://github.com/RLinf/RLinf) is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.
<div align="center">
<img src="overview.png" alt="RLinf-overview" width="600"/>
</div>
## Model Description
This model is trained on ``gen-robot/openvla-7b-rlvla-warmup`` by Proximal Policy Optimization (PPO) on the ManiSkill simulator.
## Full OOD Evaluation and Results
### Overall Eval Results
Note: rl4vla refers to the paper VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study.
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| Avg results | 0.7915 | 0.6064 | 0.7705 | **0.8193** | 0.7515 |
### Training Setting Eval
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| Avg results | 0.9375 | 0.9414 | **0.9766** | 0.9609 | 0.8438 |
### OOD Eval on Vision
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| vision avg | 0.8047 | 0.8469 | **0.9211** | 0.8203 | 0.7469 |
| unseen table | 0.9063 | 0.9141 | **0.9648** | 0.9570 | 0.8984 |
| dynamic texture (weak) | 0.8516 | 0.9102 | **0.9492** | 0.8555 | 0.7891 |
| dynamic texture (strong) | 0.7500 | 0.7734 | **0.8633** | 0.7227 | 0.6563 |
| dynamic noise (weak) | 0.8281 | 0.8945 | **0.9805** | 0.8711 | 0.7969|
| dynamic noise (strong) | 0.6875 | 0.7422 | **0.8477** | 0.6953 | 0.5938 |
### OOD Eval on Semantic
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| object avg | 0.7500 | 0.4553 | 0.6484 | **0.7835** | 0.7299 |
| unseen objects | 0.8281 | 0.8047 | **0.8594** | 0.8164 | 0.7656 |
| unseen receptacles | 0.6875 | 0.7422 | **0.8750** | 0.8125 | 0.7344 |
| unseen instructions | 0.8203 | 0.6797 | 0.7109 | **0.9453** | 0.8906 |
| multi-object (both seen) | 0.7891 | 0.3516 | 0.6055 | **0.8438** | 0.7578 |
| multi-object (both unseen) | 0.5703 | 0.3047 | 0.5508 | **0.6289** | 0.5781 |
| distractive receptacle | 0.8047 | 0.1875 | 0.6133 | **0.8281** | 0.7813 |
| multi-receptacle (both unseen) | **0.7500** | 0.3242 | 0.23828125 | 0.6094 | 0.6016 |
### OOD Eval on Position
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| position avg | 0.8177 | 0.4466 | 0.7357 | **0.8542** | 0.7786 |
| unseen position (object & receptacle) | 0.7344 | 0.4023 | 0.6992 | **0.8633** | 0.7500 |
| unseen robot init pose | **0.8359** | 0.4805 | 0.7188 | 0.7773 | 0.7031 |
| mid-episode object reposition | 0.8828 | 0.4570 | 0.7891 | **0.9212** | 0.8828 |
## How to Use
Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_ppo_openvla.yaml``:
- Set ``actor.checkpoint_load_path``, ``actor.tokenizer.tokenizer_model``, and ``rollout.model_dir`` to the path of the model checkpoint.
Note: If you intend to evaluate the model directly, make sure to set ``actor.model.is_lora`` to ``false``.
## License
This code repository and the model weights are licensed under the MIT License.
|