moojink commited on
Commit
638918f
·
verified ·
1 Parent(s): 13cdacd

Add README.md

Browse files

(Thank you to

@nielsr
for the template!)

Files changed (1) hide show
  1. README.md +67 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: robotics
3
+ library_name: transformers
4
+ license: mit
5
+ ---
6
+ # Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
7
+
8
+ This repository contains the OpenVLA-OFT checkpoint trained on 4 LIBERO task suites combined (-Spatial, -Object, -Goal, -Long), as described in [Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success](https://arxiv.org/abs/2502.19645). OpenVLA-OFT significantly improves upon the base OpenVLA model by incorporating optimized fine-tuning techniques.
9
+
10
+ Project Page: https://openvla-oft.github.io/
11
+
12
+ Code: https://github.com/openvla-oft/openvla-oft
13
+
14
+ See here for other OpenVLA-OFT checkpoints: https://huggingface.co/moojink?search_models=oft
15
+ ## Quick Start
16
+ This example demonstrates generating an action chunk using a pretrained OpenVLA-OFT checkpoint. Ensure you have set up the conda environment as described in the GitHub README.
17
+ ```python
18
+ import pickle
19
+ from experiments.robot.libero.run_libero_eval import GenerateConfig
20
+ from experiments.robot.openvla_utils import get_action_head, get_processor, get_proprio_projector, get_vla, get_vla_action
21
+ from prismatic.vla.constants import NUM_ACTIONS_CHUNK, PROPRIO_DIM
22
+ # Instantiate config (see class GenerateConfig in experiments/robot/libero/run_libero_eval.py for definitions)
23
+ cfg = GenerateConfig(
24
+ pretrained_checkpoint = "moojink/openvla-7b-oft-finetuned-libero-spatial",
25
+ use_l1_regression = True,
26
+ use_diffusion = False,
27
+ use_film = False,
28
+ num_images_in_input = 2,
29
+ use_proprio = True,
30
+ load_in_8bit = False,
31
+ load_in_4bit = False,
32
+ center_crop = True,
33
+ num_open_loop_steps = NUM_ACTIONS_CHUNK,
34
+ unnorm_key = "libero_spatial_no_noops",
35
+ )
36
+ # Load OpenVLA-OFT policy and inputs processor
37
+ vla = get_vla(cfg)
38
+ processor = get_processor(cfg)
39
+ # Load MLP action head to generate continuous actions (via L1 regression)
40
+ action_head = get_action_head(cfg, llm_dim=vla.llm_dim)
41
+ # Load proprio projector to map proprio to language embedding space
42
+ proprio_projector = get_proprio_projector(cfg, llm_dim=vla.llm_dim, proprio_dim=PROPRIO_DIM)
43
+
44
+ # Load sample observation:
45
+ # observation (dict): {
46
+ # "full_image": primary third-person image,
47
+ # "wrist_image": wrist-mounted camera image,
48
+ # "state": robot proprioceptive state,
49
+ # "task_description": task description,
50
+ # }
51
+ with open("experiments/robot/libero/sample_libero_spatial_observation.pkl", "rb") as file:
52
+ observation = pickle.load(file)
53
+ # Generate robot action chunk (sequence of future actions)
54
+ actions = get_vla_action(cfg, vla, processor, observation, observation["task_description"], action_head, proprio_projector)
55
+ print("Generated action chunk:")
56
+ for act in actions:
57
+ print(act)
58
+ ```
59
+ ## Citation
60
+ ```bibtex
61
+ @article{kim2025fine,
62
+ title={Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success},
63
+ author={Kim, Moo Jin and Finn, Chelsea and Liang, Percy},
64
+ journal={arXiv preprint arXiv:2502.19645},
65
+ year={2025}
66
+ }
67
+ ```