RLinf
/

RLinf-OpenVLA-PPO-ManiSkill3-25ood

@@ -7,11 +7,19 @@ language:
 metrics:
 - accuracy
 base_model:
-- Haozhan72/Openvla-oft-SFT-libero-goal-trajall
 pipeline_tag: reinforcement-learning
 model-index:
-- name: RLinf-openvlaoft-maniskill3-ppo
   results:
   - task:
       type: VLA
     dataset:
@@ -19,7 +27,7 @@ model-index:
       name: maniskill-vision
     metrics:
       - type: accuracy
-        value: 80.5
   - task:
       type: VLA
     dataset:
@@ -27,7 +35,7 @@ model-index:
       name: maniskill-semantic
     metrics:
       - type: accuracy
-        value: 56.6
   - task:
       type: VLA
     dataset:
@@ -35,8 +43,9 @@ model-index:
       name: maniskill-position
     metrics:
       - type: accuracy
-        value: 56.1
 ---
 <div align="center">
   <img src="logo.svg" alt="RLinf-logo" width="500"/>
 </div>
@@ -61,7 +70,7 @@ model-index:
 </div>
 ## Model Description
-This openvla-oft model is trained on ``Haozhan72/Openvla-oft-SFT-libero10-trajall`` with an additional lora SFT checkpoint and finetuned by Proximal Policy Optimization (PPO) on the ManiSkill simulator.
 ## Full OOD Evaluation and Results
 ### Overall Eval Results
@@ -107,11 +116,11 @@ Note: rl4vla refers to the paper VLA-RL-Study: What Can RL Bring to VLA Generali
 | mid-episode object reposition	| 0.8828	| 0.4570	| 0.7891	| **0.9212**	| 0.8828 |
 ## How to Use
-Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_ppo_openvlaoft.yaml``:
 - Set ``actor.checkpoint_load_path``,  ``actor.tokenizer.tokenizer_model``, and ``rollout.model_dir`` to the path of the model checkpoint.
 Note: If you intend to evaluate the model directly, make sure to set ``actor.model.is_lora`` to ``false``.
 ## License
-This code repository and the model weights are licensed under the MIT License.

 metrics:
 - accuracy
 base_model:
+- gen-robot/openvla-7b-rlvla-warmup
 pipeline_tag: reinforcement-learning
 model-index:
+- name: RLinf-openvla-maniskill3-ppo
   results:
+  - task:
+      type: VLA
+    dataset:
+      type: maniskill-train
+      name: maniskill-train
+    metrics:
+      - type: accuracy
+        value: 96.09
   - task:
       type: VLA
     dataset:
       name: maniskill-vision
     metrics:
       - type: accuracy
+        value: 82.03
   - task:
       type: VLA
     dataset:
       name: maniskill-semantic
     metrics:
       - type: accuracy
+        value: 78.35
   - task:
       type: VLA
     dataset:
       name: maniskill-position
     metrics:
       - type: accuracy
+        value: 85.42
 ---
 <div align="center">
   <img src="logo.svg" alt="RLinf-logo" width="500"/>
 </div>
 </div>
 ## Model Description
+This model is trained on ``gen-robot/openvla-7b-rlvla-warmup`` by Proximal Policy Optimization (PPO) on the ManiSkill simulator.
 ## Full OOD Evaluation and Results
 ### Overall Eval Results
 | mid-episode object reposition	| 0.8828	| 0.4570	| 0.7891	| **0.9212**	| 0.8828 |
 ## How to Use
+Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_ppo_openvla.yaml``:
 - Set ``actor.checkpoint_load_path``,  ``actor.tokenizer.tokenizer_model``, and ``rollout.model_dir`` to the path of the model checkpoint.
 Note: If you intend to evaluate the model directly, make sure to set ``actor.model.is_lora`` to ``false``.
 ## License
+This code repository and the model weights are licensed under the MIT License.