Upload policy weights, train config and readme
Browse files- README.md +62 -0
- config.json +61 -0
- model.safetensors +3 -0
- train_config.json +205 -0
    	
        README.md
    ADDED
    
    | @@ -0,0 +1,62 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            datasets: lerobot/pusht
         | 
| 3 | 
            +
            library_name: lerobot
         | 
| 4 | 
            +
            license: apache-2.0
         | 
| 5 | 
            +
            model_name: act
         | 
| 6 | 
            +
            pipeline_tag: robotics
         | 
| 7 | 
            +
            tags:
         | 
| 8 | 
            +
            - act
         | 
| 9 | 
            +
            - robotics
         | 
| 10 | 
            +
            - lerobot
         | 
| 11 | 
            +
            ---
         | 
| 12 | 
            +
             | 
| 13 | 
            +
            # Model Card for act
         | 
| 14 | 
            +
             | 
| 15 | 
            +
            <!-- Provide a quick summary of what the model is/does. -->
         | 
| 16 | 
            +
             | 
| 17 | 
            +
             | 
| 18 | 
            +
            [Action Chunking with Transformers (ACT)](https://huggingface.co/papers/2304.13705) is an imitation-learning method that predicts short action chunks instead of single steps. It learns from teleoperated data and often achieves high success rates.
         | 
| 19 | 
            +
             | 
| 20 | 
            +
             | 
| 21 | 
            +
            This policy has been trained and pushed to the Hub using [LeRobot](https://github.com/huggingface/lerobot).
         | 
| 22 | 
            +
            See the full documentation at [LeRobot Docs](https://huggingface.co/docs/lerobot/index).
         | 
| 23 | 
            +
             | 
| 24 | 
            +
            ---
         | 
| 25 | 
            +
             | 
| 26 | 
            +
            ## How to Get Started with the Model
         | 
| 27 | 
            +
             | 
| 28 | 
            +
            For a complete walkthrough, see the [training guide](https://huggingface.co/docs/lerobot/il_robots#train-a-policy).
         | 
| 29 | 
            +
            Below is the short version on how to train and run inference/eval:
         | 
| 30 | 
            +
             | 
| 31 | 
            +
            ### Train from scratch
         | 
| 32 | 
            +
             | 
| 33 | 
            +
            ```bash
         | 
| 34 | 
            +
            lerobot-train \
         | 
| 35 | 
            +
              --dataset.repo_id=${HF_USER}/<dataset> \
         | 
| 36 | 
            +
              --policy.type=act \
         | 
| 37 | 
            +
              --output_dir=outputs/train/<desired_policy_repo_id> \
         | 
| 38 | 
            +
              --job_name=lerobot_training \
         | 
| 39 | 
            +
              --policy.device=cuda \
         | 
| 40 | 
            +
              --policy.repo_id=${HF_USER}/<desired_policy_repo_id>
         | 
| 41 | 
            +
              --wandb.enable=true
         | 
| 42 | 
            +
            ```
         | 
| 43 | 
            +
             | 
| 44 | 
            +
            _Writes checkpoints to `outputs/train/<desired_policy_repo_id>/checkpoints/`._
         | 
| 45 | 
            +
             | 
| 46 | 
            +
            ### Evaluate the policy/run inference
         | 
| 47 | 
            +
             | 
| 48 | 
            +
            ```bash
         | 
| 49 | 
            +
            lerobot-record \
         | 
| 50 | 
            +
              --robot.type=so100_follower \
         | 
| 51 | 
            +
              --dataset.repo_id=<hf_user>/eval_<dataset> \
         | 
| 52 | 
            +
              --policy.path=<hf_user>/<desired_policy_repo_id> \
         | 
| 53 | 
            +
              --episodes=10
         | 
| 54 | 
            +
            ```
         | 
| 55 | 
            +
             | 
| 56 | 
            +
            Prefix the dataset repo with **eval\_** and supply `--policy.path` pointing to a local or hub checkpoint.
         | 
| 57 | 
            +
             | 
| 58 | 
            +
            ---
         | 
| 59 | 
            +
             | 
| 60 | 
            +
            ## Model Details
         | 
| 61 | 
            +
             | 
| 62 | 
            +
            - **License:** apache-2.0
         | 
    	
        config.json
    ADDED
    
    | @@ -0,0 +1,61 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
                "type": "act",
         | 
| 3 | 
            +
                "n_obs_steps": 1,
         | 
| 4 | 
            +
                "normalization_mapping": {
         | 
| 5 | 
            +
                    "VISUAL": "MEAN_STD",
         | 
| 6 | 
            +
                    "STATE": "MEAN_STD",
         | 
| 7 | 
            +
                    "ACTION": "MEAN_STD"
         | 
| 8 | 
            +
                },
         | 
| 9 | 
            +
                "input_features": {
         | 
| 10 | 
            +
                    "observation.image": {
         | 
| 11 | 
            +
                        "type": "VISUAL",
         | 
| 12 | 
            +
                        "shape": [
         | 
| 13 | 
            +
                            3,
         | 
| 14 | 
            +
                            96,
         | 
| 15 | 
            +
                            96
         | 
| 16 | 
            +
                        ]
         | 
| 17 | 
            +
                    },
         | 
| 18 | 
            +
                    "observation.state": {
         | 
| 19 | 
            +
                        "type": "STATE",
         | 
| 20 | 
            +
                        "shape": [
         | 
| 21 | 
            +
                            2
         | 
| 22 | 
            +
                        ]
         | 
| 23 | 
            +
                    }
         | 
| 24 | 
            +
                },
         | 
| 25 | 
            +
                "output_features": {
         | 
| 26 | 
            +
                    "action": {
         | 
| 27 | 
            +
                        "type": "ACTION",
         | 
| 28 | 
            +
                        "shape": [
         | 
| 29 | 
            +
                            2
         | 
| 30 | 
            +
                        ]
         | 
| 31 | 
            +
                    }
         | 
| 32 | 
            +
                },
         | 
| 33 | 
            +
                "device": "cuda",
         | 
| 34 | 
            +
                "use_amp": false,
         | 
| 35 | 
            +
                "push_to_hub": true,
         | 
| 36 | 
            +
                "repo_id": "arclabmit/pusht_act_model",
         | 
| 37 | 
            +
                "private": null,
         | 
| 38 | 
            +
                "tags": null,
         | 
| 39 | 
            +
                "license": null,
         | 
| 40 | 
            +
                "chunk_size": 100,
         | 
| 41 | 
            +
                "n_action_steps": 100,
         | 
| 42 | 
            +
                "vision_backbone": "resnet18",
         | 
| 43 | 
            +
                "pretrained_backbone_weights": "ResNet18_Weights.IMAGENET1K_V1",
         | 
| 44 | 
            +
                "replace_final_stride_with_dilation": false,
         | 
| 45 | 
            +
                "pre_norm": false,
         | 
| 46 | 
            +
                "dim_model": 512,
         | 
| 47 | 
            +
                "n_heads": 8,
         | 
| 48 | 
            +
                "dim_feedforward": 3200,
         | 
| 49 | 
            +
                "feedforward_activation": "relu",
         | 
| 50 | 
            +
                "n_encoder_layers": 4,
         | 
| 51 | 
            +
                "n_decoder_layers": 1,
         | 
| 52 | 
            +
                "use_vae": true,
         | 
| 53 | 
            +
                "latent_dim": 32,
         | 
| 54 | 
            +
                "n_vae_encoder_layers": 4,
         | 
| 55 | 
            +
                "temporal_ensemble_coeff": null,
         | 
| 56 | 
            +
                "dropout": 0.1,
         | 
| 57 | 
            +
                "kl_weight": 10.0,
         | 
| 58 | 
            +
                "optimizer_lr": 1e-05,
         | 
| 59 | 
            +
                "optimizer_weight_decay": 0.0001,
         | 
| 60 | 
            +
                "optimizer_lr_backbone": 1e-05
         | 
| 61 | 
            +
            }
         | 
    	
        model.safetensors
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:68e1826b710ea6125c0c9eef7bcfdb865305ff9db87f02cf9ec5c2504b09d2b0
         | 
| 3 | 
            +
            size 206667888
         | 
    	
        train_config.json
    ADDED
    
    | @@ -0,0 +1,205 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
                "dataset": {
         | 
| 3 | 
            +
                    "repo_id": "lerobot/pusht",
         | 
| 4 | 
            +
                    "root": null,
         | 
| 5 | 
            +
                    "episodes": null,
         | 
| 6 | 
            +
                    "image_transforms": {
         | 
| 7 | 
            +
                        "enable": false,
         | 
| 8 | 
            +
                        "max_num_transforms": 3,
         | 
| 9 | 
            +
                        "random_order": false,
         | 
| 10 | 
            +
                        "tfs": {
         | 
| 11 | 
            +
                            "brightness": {
         | 
| 12 | 
            +
                                "weight": 1.0,
         | 
| 13 | 
            +
                                "type": "ColorJitter",
         | 
| 14 | 
            +
                                "kwargs": {
         | 
| 15 | 
            +
                                    "brightness": [
         | 
| 16 | 
            +
                                        0.8,
         | 
| 17 | 
            +
                                        1.2
         | 
| 18 | 
            +
                                    ]
         | 
| 19 | 
            +
                                }
         | 
| 20 | 
            +
                            },
         | 
| 21 | 
            +
                            "contrast": {
         | 
| 22 | 
            +
                                "weight": 1.0,
         | 
| 23 | 
            +
                                "type": "ColorJitter",
         | 
| 24 | 
            +
                                "kwargs": {
         | 
| 25 | 
            +
                                    "contrast": [
         | 
| 26 | 
            +
                                        0.8,
         | 
| 27 | 
            +
                                        1.2
         | 
| 28 | 
            +
                                    ]
         | 
| 29 | 
            +
                                }
         | 
| 30 | 
            +
                            },
         | 
| 31 | 
            +
                            "saturation": {
         | 
| 32 | 
            +
                                "weight": 1.0,
         | 
| 33 | 
            +
                                "type": "ColorJitter",
         | 
| 34 | 
            +
                                "kwargs": {
         | 
| 35 | 
            +
                                    "saturation": [
         | 
| 36 | 
            +
                                        0.5,
         | 
| 37 | 
            +
                                        1.5
         | 
| 38 | 
            +
                                    ]
         | 
| 39 | 
            +
                                }
         | 
| 40 | 
            +
                            },
         | 
| 41 | 
            +
                            "hue": {
         | 
| 42 | 
            +
                                "weight": 1.0,
         | 
| 43 | 
            +
                                "type": "ColorJitter",
         | 
| 44 | 
            +
                                "kwargs": {
         | 
| 45 | 
            +
                                    "hue": [
         | 
| 46 | 
            +
                                        -0.05,
         | 
| 47 | 
            +
                                        0.05
         | 
| 48 | 
            +
                                    ]
         | 
| 49 | 
            +
                                }
         | 
| 50 | 
            +
                            },
         | 
| 51 | 
            +
                            "sharpness": {
         | 
| 52 | 
            +
                                "weight": 1.0,
         | 
| 53 | 
            +
                                "type": "SharpnessJitter",
         | 
| 54 | 
            +
                                "kwargs": {
         | 
| 55 | 
            +
                                    "sharpness": [
         | 
| 56 | 
            +
                                        0.5,
         | 
| 57 | 
            +
                                        1.5
         | 
| 58 | 
            +
                                    ]
         | 
| 59 | 
            +
                                }
         | 
| 60 | 
            +
                            }
         | 
| 61 | 
            +
                        }
         | 
| 62 | 
            +
                    },
         | 
| 63 | 
            +
                    "revision": null,
         | 
| 64 | 
            +
                    "use_imagenet_stats": true,
         | 
| 65 | 
            +
                    "video_backend": "torchcodec",
         | 
| 66 | 
            +
                    "tolerance_s": 0.0001
         | 
| 67 | 
            +
                },
         | 
| 68 | 
            +
                "env": {
         | 
| 69 | 
            +
                    "type": "pusht",
         | 
| 70 | 
            +
                    "task": "PushT-v0",
         | 
| 71 | 
            +
                    "fps": 10,
         | 
| 72 | 
            +
                    "features": {
         | 
| 73 | 
            +
                        "action": {
         | 
| 74 | 
            +
                            "type": "ACTION",
         | 
| 75 | 
            +
                            "shape": [
         | 
| 76 | 
            +
                                2
         | 
| 77 | 
            +
                            ]
         | 
| 78 | 
            +
                        },
         | 
| 79 | 
            +
                        "agent_pos": {
         | 
| 80 | 
            +
                            "type": "STATE",
         | 
| 81 | 
            +
                            "shape": [
         | 
| 82 | 
            +
                                2
         | 
| 83 | 
            +
                            ]
         | 
| 84 | 
            +
                        },
         | 
| 85 | 
            +
                        "pixels": {
         | 
| 86 | 
            +
                            "type": "VISUAL",
         | 
| 87 | 
            +
                            "shape": [
         | 
| 88 | 
            +
                                384,
         | 
| 89 | 
            +
                                384,
         | 
| 90 | 
            +
                                3
         | 
| 91 | 
            +
                            ]
         | 
| 92 | 
            +
                        }
         | 
| 93 | 
            +
                    },
         | 
| 94 | 
            +
                    "features_map": {
         | 
| 95 | 
            +
                        "action": "action",
         | 
| 96 | 
            +
                        "agent_pos": "observation.state",
         | 
| 97 | 
            +
                        "environment_state": "observation.environment_state",
         | 
| 98 | 
            +
                        "pixels": "observation.image"
         | 
| 99 | 
            +
                    },
         | 
| 100 | 
            +
                    "episode_length": 300,
         | 
| 101 | 
            +
                    "obs_type": "pixels_agent_pos",
         | 
| 102 | 
            +
                    "render_mode": "rgb_array",
         | 
| 103 | 
            +
                    "visualization_width": 384,
         | 
| 104 | 
            +
                    "visualization_height": 384
         | 
| 105 | 
            +
                },
         | 
| 106 | 
            +
                "policy": {
         | 
| 107 | 
            +
                    "type": "act",
         | 
| 108 | 
            +
                    "n_obs_steps": 1,
         | 
| 109 | 
            +
                    "normalization_mapping": {
         | 
| 110 | 
            +
                        "VISUAL": "MEAN_STD",
         | 
| 111 | 
            +
                        "STATE": "MEAN_STD",
         | 
| 112 | 
            +
                        "ACTION": "MEAN_STD"
         | 
| 113 | 
            +
                    },
         | 
| 114 | 
            +
                    "input_features": {
         | 
| 115 | 
            +
                        "observation.image": {
         | 
| 116 | 
            +
                            "type": "VISUAL",
         | 
| 117 | 
            +
                            "shape": [
         | 
| 118 | 
            +
                                3,
         | 
| 119 | 
            +
                                96,
         | 
| 120 | 
            +
                                96
         | 
| 121 | 
            +
                            ]
         | 
| 122 | 
            +
                        },
         | 
| 123 | 
            +
                        "observation.state": {
         | 
| 124 | 
            +
                            "type": "STATE",
         | 
| 125 | 
            +
                            "shape": [
         | 
| 126 | 
            +
                                2
         | 
| 127 | 
            +
                            ]
         | 
| 128 | 
            +
                        }
         | 
| 129 | 
            +
                    },
         | 
| 130 | 
            +
                    "output_features": {
         | 
| 131 | 
            +
                        "action": {
         | 
| 132 | 
            +
                            "type": "ACTION",
         | 
| 133 | 
            +
                            "shape": [
         | 
| 134 | 
            +
                                2
         | 
| 135 | 
            +
                            ]
         | 
| 136 | 
            +
                        }
         | 
| 137 | 
            +
                    },
         | 
| 138 | 
            +
                    "device": "cuda",
         | 
| 139 | 
            +
                    "use_amp": false,
         | 
| 140 | 
            +
                    "push_to_hub": true,
         | 
| 141 | 
            +
                    "repo_id": "arclabmit/pusht_act_model",
         | 
| 142 | 
            +
                    "private": null,
         | 
| 143 | 
            +
                    "tags": null,
         | 
| 144 | 
            +
                    "license": null,
         | 
| 145 | 
            +
                    "chunk_size": 100,
         | 
| 146 | 
            +
                    "n_action_steps": 100,
         | 
| 147 | 
            +
                    "vision_backbone": "resnet18",
         | 
| 148 | 
            +
                    "pretrained_backbone_weights": "ResNet18_Weights.IMAGENET1K_V1",
         | 
| 149 | 
            +
                    "replace_final_stride_with_dilation": false,
         | 
| 150 | 
            +
                    "pre_norm": false,
         | 
| 151 | 
            +
                    "dim_model": 512,
         | 
| 152 | 
            +
                    "n_heads": 8,
         | 
| 153 | 
            +
                    "dim_feedforward": 3200,
         | 
| 154 | 
            +
                    "feedforward_activation": "relu",
         | 
| 155 | 
            +
                    "n_encoder_layers": 4,
         | 
| 156 | 
            +
                    "n_decoder_layers": 1,
         | 
| 157 | 
            +
                    "use_vae": true,
         | 
| 158 | 
            +
                    "latent_dim": 32,
         | 
| 159 | 
            +
                    "n_vae_encoder_layers": 4,
         | 
| 160 | 
            +
                    "temporal_ensemble_coeff": null,
         | 
| 161 | 
            +
                    "dropout": 0.1,
         | 
| 162 | 
            +
                    "kl_weight": 10.0,
         | 
| 163 | 
            +
                    "optimizer_lr": 1e-05,
         | 
| 164 | 
            +
                    "optimizer_weight_decay": 0.0001,
         | 
| 165 | 
            +
                    "optimizer_lr_backbone": 1e-05
         | 
| 166 | 
            +
                },
         | 
| 167 | 
            +
                "output_dir": "outputs/train/2025-08-19/16-37-40_pusht_act",
         | 
| 168 | 
            +
                "job_name": "pusht_act",
         | 
| 169 | 
            +
                "resume": false,
         | 
| 170 | 
            +
                "seed": 100000,
         | 
| 171 | 
            +
                "num_workers": 4,
         | 
| 172 | 
            +
                "batch_size": 64,
         | 
| 173 | 
            +
                "steps": 200000,
         | 
| 174 | 
            +
                "eval_freq": 25000,
         | 
| 175 | 
            +
                "log_freq": 200,
         | 
| 176 | 
            +
                "save_checkpoint": true,
         | 
| 177 | 
            +
                "save_freq": 25000,
         | 
| 178 | 
            +
                "use_policy_training_preset": true,
         | 
| 179 | 
            +
                "optimizer": {
         | 
| 180 | 
            +
                    "type": "adamw",
         | 
| 181 | 
            +
                    "lr": 1e-05,
         | 
| 182 | 
            +
                    "weight_decay": 0.0001,
         | 
| 183 | 
            +
                    "grad_clip_norm": 10.0,
         | 
| 184 | 
            +
                    "betas": [
         | 
| 185 | 
            +
                        0.9,
         | 
| 186 | 
            +
                        0.999
         | 
| 187 | 
            +
                    ],
         | 
| 188 | 
            +
                    "eps": 1e-08
         | 
| 189 | 
            +
                },
         | 
| 190 | 
            +
                "scheduler": null,
         | 
| 191 | 
            +
                "eval": {
         | 
| 192 | 
            +
                    "n_episodes": 50,
         | 
| 193 | 
            +
                    "batch_size": 50,
         | 
| 194 | 
            +
                    "use_async_envs": false
         | 
| 195 | 
            +
                },
         | 
| 196 | 
            +
                "wandb": {
         | 
| 197 | 
            +
                    "enable": true,
         | 
| 198 | 
            +
                    "disable_artifact": false,
         | 
| 199 | 
            +
                    "project": "lerobot",
         | 
| 200 | 
            +
                    "entity": null,
         | 
| 201 | 
            +
                    "notes": null,
         | 
| 202 | 
            +
                    "run_id": "fhxcxi7k",
         | 
| 203 | 
            +
                    "mode": null
         | 
| 204 | 
            +
                }
         | 
| 205 | 
            +
            }
         | 

