SmolVLA: A vision-language-action model for affordable and efficient robotics

Paper

Code

Designed by Hugging Face.

This model has 450M parameters in total. You can use inside the LeRobot library.

Install smolvla extra dependencies:

pip install -e ".[smolvla]"

Example of finetuning the smolvla pretrained model (smolvla_base):

python lerobot/scripts/train.py \
--policy.path=lerobot/smolvla_base \
--dataset.repo_id=danaaubakirova/svla_so100_task1_v3 \
--batch_size=64 \
--steps=200000

Example of finetuning the smolvla neural network with pretrained VLM and action expert intialized from scratch:

python lerobot/scripts/train.py \
--policy.type=smolvla \
--dataset.repo_id=danaaubakirova/svla_so100_task1_v3 \
--batch_size=64 \
--steps=200000

Example of using the smolvla pretrained model outside LeRobot training framework:

policy = SmolVLAPolicy.from_pretrained("lerobot/smolvla_base")
Downloads last month
0
Video Preview
loading