Nora-Long

Nora-Long is an open vision-language-action model trained on robot manipulation episodes from the Open X-Embodiment dataset. The model takes language instructions and camera images as input and generates robot actions. Nora-Lonf is trained directly from Qwen 2.5 VL-3B. All Nora checkpoints, as well as our training codebase are released under an MIT License.

Unlike Nora, Nora-Long is pretrained with an action horizon of 5

Model Description

  • Model type: Vision-language-action (language, image => robot actions)
  • Language(s) (NLP): english
  • License: MIT
  • Finetuned from model : Qwen 2.5 VL-3B

Model Sources

Usage

Nora take a language instruction and a camera image of a robot workspace as input, and predict (normalized) robot actions consisting of 7-DoF end-effector deltas of the form (x, y, z, roll, pitch, yaw, gripper). To execute on an actual robot platform, actions need to be un-normalized subject to statistics computed on a per-robot, per-dataset basis. Instructions on how to run Nora is available on https://github.com/declare-lab/nora.

Downloads last month
0
Safetensors
Model size
3.76B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support