File size: 5,893 Bytes
4bd3201
 
 
 
 
 
 
 
 
08ddfb1
4bd3201
 
08ddfb1
4bd3201
08ddfb1
 
 
 
 
 
 
 
4bd3201
 
 
 
 
 
 
08ddfb1
4bd3201
 
 
 
 
 
 
08ddfb1
4bd3201
 
 
 
 
 
 
08ddfb1
4bd3201
08ddfb1
4bd3201
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
08ddfb1
4bd3201
 
b12c5b9
4bd3201
b12c5b9
 
 
 
 
 
4bd3201
b12c5b9
 
4bd3201
 
b12c5b9
4bd3201
b12c5b9
 
 
 
 
 
4bd3201
 
b12c5b9
4bd3201
b12c5b9
 
 
 
 
 
 
 
4bd3201
 
b12c5b9
4bd3201
b12c5b9
 
 
 
4bd3201
 
08ddfb1
4bd3201
 
 
 
 
 
08ddfb1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
license: mit
tags:
- RLinf
language:
- en
metrics:
- accuracy
base_model:
- gen-robot/openvla-7b-rlvla-warmup
pipeline_tag: reinforcement-learning
model-index:
- name: RLinf-openvla-maniskill3-ppo
  results:
  - task:
      type: VLA             
    dataset:
      type: maniskill-train
      name: maniskill-train
    metrics:
      - type: accuracy        
        value: 96.09
  - task:
      type: VLA             
    dataset:
      type: maniskill-vision
      name: maniskill-vision
    metrics:
      - type: accuracy        
        value: 82.03
  - task:
      type: VLA             
    dataset:
      type: maniskill-semantic
      name: maniskill-semantic
    metrics:
      - type: accuracy        
        value: 78.35
  - task:
      type: VLA             
    dataset:
      type: maniskill-position
      name: maniskill-position
    metrics:
      - type: accuracy        
        value: 85.42
---

<div align="center">
  <img src="logo.svg" alt="RLinf-logo" width="500"/>
</div>


<div align="center">
<!-- <a href="TODO"><img src="https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv"></a> -->
<!-- <a href="TODO"><img src="https://img.shields.io/badge/HuggingFace-yellow?logo=huggingface&logoColor=white" alt="Hugging Face"></a> -->
<a href="https://github.com/RLinf/RLinf"><img src="https://img.shields.io/badge/Github-blue"></a>
<a href="https://rlinf.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/Documentation-Purple?color=8A2BE2&logo=readthedocs"></a>
<!-- <a href="TODO"><img src="https://devin.ai/assets/deepwiki-badge.png" alt="Ask DeepWiki.com" style="height:20px;"></a>
<a href="TODO"><img src="https://img.shields.io/badge/微信-green?logo=wechat&amp"></a> -->
</div>

<h1 align="center">RLinf: Reinforcement Learning Infrastructure for Agentic AI</h1>

[RLinf](https://github.com/RLinf/RLinf) is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.


<div align="center">
  <img src="overview.png" alt="RLinf-overview" width="600"/>
</div>

## Model Description
This model is trained on ``gen-robot/openvla-7b-rlvla-warmup`` by Proximal Policy Optimization (PPO) on the ManiSkill simulator.

## Full OOD Evaluation and Results
### Overall Eval Results
Note: rl4vla refers to the paper VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study.
| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| Avg results	| 0.7915	| 0.6064	  | 0.7705	   | **0.8193** | 0.7515     |

### Training Setting Eval
| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| Avg results	| 0.9375	| 0.9414	  | **0.9766**	   | 0.9609 | 0.8438     |

### OOD Eval on Vision

| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| vision avg	| 0.8047	| 0.8469	      | **0.9211**	   | 0.8203	 | 0.7469      |
| unseen table	| 0.9063	    | 0.9141	      | **0.9648**	   | 0.9570	 | 0.8984     |
| dynamic texture (weak) | 0.8516	| 0.9102	| **0.9492**	| 0.8555	| 0.7891 |	
| dynamic texture (strong)	| 0.7500	| 0.7734	| **0.8633**	| 0.7227	| 0.6563 |					
| dynamic noise (weak)	| 0.8281	| 0.8945	| **0.9805**	| 0.8711	| 0.7969| 
| dynamic noise (strong)	| 0.6875	| 0.7422	| **0.8477**	| 0.6953	| 0.5938 |

### OOD Eval on Semantic
| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| object avg	| 0.7500	| 0.4553	| 0.6484	| **0.7835**	| 0.7299 | 		
| unseen objects	| 0.8281	| 0.8047	| **0.8594**	| 0.8164	| 0.7656 | 		
| unseen receptacles	| 0.6875	| 0.7422	| **0.8750**	| 0.8125	| 0.7344 | 			
| unseen instructions	| 0.8203	| 0.6797	| 0.7109	| **0.9453**	| 0.8906 | 
| multi-object (both seen)	| 0.7891	| 0.3516	| 0.6055	| **0.8438**	| 0.7578 | 
| multi-object (both unseen)	| 0.5703	| 0.3047	| 0.5508	| **0.6289**	| 0.5781 | 
| distractive receptacle	| 0.8047	| 0.1875	| 0.6133	| **0.8281**	| 0.7813 | 
| multi-receptacle (both unseen)	| **0.7500**	| 0.3242	| 0.23828125	| 0.6094	| 0.6016 |

### OOD Eval on Position
| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| position avg	| 0.8177	| 0.4466	| 0.7357	| **0.8542**	| 0.7786 | 					
| unseen position (object & receptacle)	| 0.7344	| 0.4023	| 0.6992	| **0.8633**	| 0.7500 | 
| unseen robot init pose | **0.8359** | 0.4805 | 0.7188 | 0.7773 | 0.7031 |
| mid-episode object reposition	| 0.8828	| 0.4570	| 0.7891	| **0.9212**	| 0.8828 | 

## How to Use
Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_ppo_openvla.yaml``:

- Set ``actor.checkpoint_load_path``,  ``actor.tokenizer.tokenizer_model``, and ``rollout.model_dir`` to the path of the model checkpoint.

Note: If you intend to evaluate the model directly, make sure to set ``actor.model.is_lora`` to ``false``.

## License
This code repository and the model weights are licensed under the MIT License.