YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Ο€β‚€ (Pi0)

These weights directly come from the Pytorch conversion script of openpi and their pi0_base model.

Ο€β‚€ is a Vision-Language-Action model for general robot control, from Physical Intelligence. The LeRobot implementation is adapted from their open source OpenPI repository.

Model Overview

Ο€β‚€ represents a breakthrough in robotics as the first general-purpose robot foundation model developed by Physical Intelligence. Unlike traditional robots that are narrow specialists programmed for repetitive motions, Ο€β‚€ is designed to be a generalist policy that can understand visual inputs, interpret natural language instructions, and control a variety of different robots across diverse tasks.

Architecture and Approach

Ο€β‚€ combines several key innovations:

  • Flow Matching: Uses a novel method to augment pre-trained VLMs with continuous action outputs via flow matching (a variant of diffusion models)
  • Cross-Embodiment Training: Trained on data from 8 distinct robot platforms including UR5e, Bimanual UR5e, Franka, Bimanual Trossen, Bimanual ARX, Mobile Trossen, and Mobile Fibocom
  • Internet-Scale Pre-training: Inherits semantic knowledge from a pre-trained 3B parameter Vision-Language Model
  • High-Frequency Control: Outputs motor commands at up to 50 Hz for real-time dexterous manipulation

Training

For training Ο€β‚€, you can use the standard LeRobot training script with the appropriate configuration:

python src/lerobot/scripts/train.py \
    --dataset.repo_id=your_dataset \
    --policy.type=pi0 \
    --output_dir=./outputs/pi0_training \
    --job_name=pi0_training \
    --policy.pretrained_path=pepijn223/pi0_base \
    --policy.repo_id=your_repo_id \
    --policy.compile_model=true \
    --policy.gradient_checkpointing=true \
    --policy.dtype=bfloat16 \
    --steps=3000 \
    --policy.scheduler_decay_steps=3000 \
    --policy.device=cuda \
    --batch_size=32

Citation

If you use this model, please cite the original OpenPI work:

@article{openpi2024,
    title={Open-World Robotic Manipulation with Vision-Language-Action Models},
    author={Physical Intelligence},
    year={2024},
    url={https://github.com/Physical-Intelligence/openpi}
}

Original Repository

OpenPI GitHub Repository

License

This model follows the same license as the original OpenPI repository.

Downloads last month
554
Safetensors
Model size
4B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using lerobot/pi0_base 1

Collection including lerobot/pi0_base