pi05_libero_base / README.md
pepijn223's picture
pepijn223 HF Staff
Update README.md
21dd60c verified
|
raw
history blame
2.78 kB

Ο€β‚€.β‚… - Libero

This is a PyTorch version of the Ο€β‚€.β‚… pi05_libero model, converted from the original JAX/Flax implementation.

Model Details

  • Architecture: PI0.5 (Vision-Language-Action model with discrete state input)
  • Model Type: PI0.5
  • Domain: LIBERO (diverse manipulation tasks)
  • Precision: 32-bit floating point (fp32)
  • Action Dimension: 32
  • Vision Model: PaliGemma (gemma_2b)
  • Action Expert: gemma_300m

Key Features

  • Discrete State Input: Uses discrete language tokens for state representation
  • Flow Matching: Utilizes adaRMSNorm for timestep injection in action expert
  • Enhanced Action Modeling: Improved action prediction with flow matching approach

Conversion Details

This model was converted from JAX to PyTorch using the OpenPI conversion script:

python examples/convert_jax_model_to_pytorch.py \
    --checkpoint_dir /pi05_base \
    --config_name pi05_libero \
    --output_path /pi05_base/pytorch/fp32/ \
    --precision float32

Usage

from openpi.models_pytorch.pi0_pytorch import PI0Pytorch
import torch

# Load the model
model = PI0Pytorch.from_pretrained("pepijn223/pi05_libero_fp32")

# The model expects inputs in the format:
# - images: torch.Tensor of shape [batch, height, width, channels]  
# - text: tokenized text prompts
# - proprioceptive_state: robot state information (if applicable)

Model Architecture

The model consists of:

  1. Vision Encoder: PaliGemma-based vision processing
  2. Language Encoder: Text prompt understanding
  3. Action Expert: Specialized network for action prediction
  4. Integration Layer: Combines multimodal information for action output

Training Data

This model was trained on robotics datasets appropriate for its domain:

  • DROID models: Trained on diverse robot manipulation data
  • ALOHA models: Trained on bimanual manipulation tasks
  • LIBERO models: Trained on diverse tabletop manipulation scenarios
  • Base models: Trained on general robotics datasets

Limitations

  • Model performance depends on similarity between deployment and training environments
  • May require domain-specific fine-tuning for optimal performance
  • Action space must match the trained action dimension (32)

Citation

If you use this model, please cite the original OpenPI work:

@article{openpi2024,
    title={Open-World Robotic Manipulation with Vision-Language-Action Models},
    author={Physical Intelligence},
    year={2024},
    url={https://github.com/Physical-Intelligence/openpi}
}

Original Repository

OpenPI GitHub Repository

License

This model follows the same license as the original OpenPI repository.