Model Card for reward_classifier

A reward classifier is a lightweight neural network that scores observations or trajectories for task success, providing a learned reward signal or offline evaluation when explicit rewards are unavailable.

This policy has been trained and pushed to the Hub using LeRobot. See the full documentation at LeRobot Docs.

How to Get Started with the Model

For a complete walkthrough, see the training guide. Below is the short version on how to train and run inference/eval:

Train from scratch

python lerobot/scripts/train.py \
  --dataset.repo_id=${HF_USER}/<dataset> \
  --policy.type=act \
  --output_dir=outputs/train/<desired_policy_repo_id> \
  --job_name=lerobot_training \
  --policy.device=cuda \
  --policy.repo_id=${HF_USER}/<desired_policy_repo_id>
  --wandb.enable=true

Writes checkpoints to outputs/train/<desired_policy_repo_id>/checkpoints/.

Evaluate the policy/run inference

python -m lerobot.record \
  --robot.type=so100_follower \
  --dataset.repo_id=<hf_user>/eval_<dataset> \
  --policy.path=<hf_user>/<desired_policy_repo_id> \
  --episodes=10

Prefix the dataset repo with eval_ and supply --policy.path pointing to a local or hub checkpoint.

Model Details

License: apache-2.0

Downloads last month: -

Safetensors

Model size

7.27M params

Tensor type

F32

Video Preview

Robotics

AIResearcherHZ
/

pick_cube_rl_reward

Model Card for reward_classifier

How to Get Started with the Model

Train from scratch

Evaluate the policy/run inference

Model Details

Dataset used to train AIResearcherHZ/pick_cube_rl_reward