Upload folder using huggingface_hub
Browse files
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
artifacts/videos/eval/episodes/best_checkpoint.mp4 filter=lfs diff=lfs merge=lfs -text
|
artifacts/hyperparam_control/hyperparameters.json
CHANGED
|
@@ -12,5 +12,5 @@
|
|
| 12 |
"clip_range - PPO clipping range (PPO only)",
|
| 13 |
"vf_coef - Value function coefficient (PPO only)"
|
| 14 |
],
|
| 15 |
-
"last_modified":
|
| 16 |
}
|
|
|
|
| 12 |
"clip_range - PPO clipping range (PPO only)",
|
| 13 |
"vf_coef - Value function coefficient (PPO only)"
|
| 14 |
],
|
| 15 |
+
"last_modified": 1755381605.2407584
|
| 16 |
}
|
artifacts/logs/training_20250816_230005_ppo_CartPole-v1.log
ADDED
|
@@ -0,0 +1,897 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
=== Training Session Started ===
|
| 2 |
+
Timestamp: 2025-08-16 23:00:05
|
| 3 |
+
Log file: runs/cvb5lyfw/logs/training_20250816_230005_ppo_CartPole-v1.log
|
| 4 |
+
Algorithm: ppo
|
| 5 |
+
Environment: CartPole-v1
|
| 6 |
+
Seed: 42
|
| 7 |
+
==================================================
|
| 8 |
+
|
| 9 |
+
Configuration saved to: runs/cvb5lyfw/configs/config.json
|
| 10 |
+
/home/tsilva/repos/tsilva/gymnasium-solver/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/setup.py:177: GPU available but not used. You can set it by doing `Trainer(accelerator='gpu')`.
|
| 11 |
+
|
| 12 |
+
ποΈ Hyperparameter manual control enabled!
|
| 13 |
+
Control directory: runs/cvb5lyfw/hyperparam_control
|
| 14 |
+
Control file: hyperparameters.json
|
| 15 |
+
Edit this file to adjust hyperparameters during training.
|
| 16 |
+
/home/tsilva/repos/tsilva/gymnasium-solver/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:425: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
|
| 17 |
+
/home/tsilva/repos/tsilva/gymnasium-solver/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:310: The number of training batches (20) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
|
| 18 |
+
--------------------------------------------
|
| 19 |
+
| train/ | |
|
| 20 |
+
| ep_rew_mean | 23.52 |
|
| 21 |
+
| ep_len_mean | 23.00 |
|
| 22 |
+
| epoch | 8 |
|
| 23 |
+
| total_timesteps | 2560 |
|
| 24 |
+
| total_episodes | 107 |
|
| 25 |
+
| total_rollouts | 10.00 |
|
| 26 |
+
| rollout_timesteps | 256 |
|
| 27 |
+
| rollout_episodes | 11.00 |
|
| 28 |
+
| fps_instant | 4492 |
|
| 29 |
+
| rollout_fps | 21420.19 |
|
| 30 |
+
| loss | 9.12 |
|
| 31 |
+
| policy_loss | -0.0078 |
|
| 32 |
+
| value_loss | 18.2623 |
|
| 33 |
+
| entropy_loss | -0.6738 |
|
| 34 |
+
| action_mean | 0.51 |
|
| 35 |
+
| action_std | 0.50 |
|
| 36 |
+
| approx_kl | 0.0059 |
|
| 37 |
+
| baseline_mean | 0.00 |
|
| 38 |
+
| baseline_std | 0.00 |
|
| 39 |
+
| clip_fraction | 0.093 |
|
| 40 |
+
| clip_range | 0.1954 |
|
| 41 |
+
| entropy | 0.6738 |
|
| 42 |
+
| explained_variance | 0.258 |
|
| 43 |
+
| fps | 1080 |
|
| 44 |
+
| kl_div | 0.0031 |
|
| 45 |
+
| learning_rate | 0.000977 |
|
| 46 |
+
| obs_mean | -0.02 |
|
| 47 |
+
| obs_std | 0.45 |
|
| 48 |
+
| reward_mean | 1.00 |
|
| 49 |
+
| reward_std | 0.00 |
|
| 50 |
+
| time_elapsed | 2.37 |
|
| 51 |
+
--------------------------------------------
|
| 52 |
+
-----------------------------------------------
|
| 53 |
+
| train/ | |
|
| 54 |
+
| ep_rew_mean | 30.24 β6.72 |
|
| 55 |
+
| ep_len_mean | 30.00 β7.00 |
|
| 56 |
+
| epoch | 18 β10 |
|
| 57 |
+
| total_timesteps | 5120 β2560.00 |
|
| 58 |
+
| total_episodes | 178 β71 |
|
| 59 |
+
| total_rollouts | 20.00 β10.00 |
|
| 60 |
+
| rollout_timesteps | 256 β0 |
|
| 61 |
+
| rollout_episodes | 5.00 β6.00 |
|
| 62 |
+
| fps_instant | 4248 β244 |
|
| 63 |
+
| rollout_fps | 21941.32 β521.13 |
|
| 64 |
+
| loss | 7.85 β1.27 |
|
| 65 |
+
| policy_loss | 0.0070 β0.0148 |
|
| 66 |
+
| value_loss | 15.6912 β2.5711 |
|
| 67 |
+
| entropy_loss | -0.6368 β0.0370 |
|
| 68 |
+
| action_mean | 0.50 β0.01 |
|
| 69 |
+
| action_std | 0.50 β0.0001 |
|
| 70 |
+
| approx_kl | 0.0109 β0.0050 |
|
| 71 |
+
| baseline_mean | 0.00 β0 |
|
| 72 |
+
| baseline_std | 0.00 β0 |
|
| 73 |
+
| clip_fraction | 0.078 β0.015 |
|
| 74 |
+
| clip_range | 0.1903 β0.0051 |
|
| 75 |
+
| entropy | 0.6368 β0.0370 |
|
| 76 |
+
| explained_variance | 0.459 β0.201 |
|
| 77 |
+
| fps | 1723 β643 |
|
| 78 |
+
| kl_div | 0.0298 β0.0266 |
|
| 79 |
+
| learning_rate | 0.000951 β0.000026 |
|
| 80 |
+
| obs_mean | -0.00 β0.02 |
|
| 81 |
+
| obs_std | 0.43 β0.01 |
|
| 82 |
+
| reward_mean | 1.00 β0 |
|
| 83 |
+
| reward_std | 0.00 β0 |
|
| 84 |
+
| time_elapsed | 2.97 β0.60 |
|
| 85 |
+
-----------------------------------------------
|
| 86 |
+
β οΈ ALGORITHM WARNING: Very negative explained variance (-0.423) indicates value function is performing poorly. Check value function architecture or learning rate.
|
| 87 |
+
-----------------------------------------------
|
| 88 |
+
| train/ | |
|
| 89 |
+
| ep_rew_mean | 45.78 β15.54 |
|
| 90 |
+
| ep_len_mean | 45.00 β15.00 |
|
| 91 |
+
| epoch | 28 β10 |
|
| 92 |
+
| total_timesteps | 7680 β2560.00 |
|
| 93 |
+
| total_episodes | 217 β39 |
|
| 94 |
+
| total_rollouts | 30.00 β10.00 |
|
| 95 |
+
| rollout_timesteps | 256 β0 |
|
| 96 |
+
| rollout_episodes | 1.00 β4.00 |
|
| 97 |
+
| fps_instant | 5165 β917 |
|
| 98 |
+
| rollout_fps | 22601.49 β660.17 |
|
| 99 |
+
| loss | 6.52 β1.33 |
|
| 100 |
+
| policy_loss | 0.0055 β0.0015 |
|
| 101 |
+
| value_loss | 13.0365 β2.6547 |
|
| 102 |
+
| entropy_loss | -0.6026 β0.0342 |
|
| 103 |
+
| action_mean | 0.50 β0.0040 |
|
| 104 |
+
| action_std | 0.50 β2.62e-06 |
|
| 105 |
+
| approx_kl | 0.0046 β0.0063 |
|
| 106 |
+
| baseline_mean | 0.00 β0 |
|
| 107 |
+
| baseline_std | 0.00 β0 |
|
| 108 |
+
| clip_fraction | 0.055 β0.023 |
|
| 109 |
+
| clip_range | 0.1852 β0.0051 |
|
| 110 |
+
| entropy | 0.6026 β0.0342 |
|
| 111 |
+
| explained_variance | -0.423 β0.882 |
|
| 112 |
+
| fps | 2161 β438 |
|
| 113 |
+
| kl_div | 0.0101 β0.0197 |
|
| 114 |
+
| learning_rate | 0.000926 β0.000026 |
|
| 115 |
+
| obs_mean | 0.02 β0.02 |
|
| 116 |
+
| obs_std | 0.46 β0.03 |
|
| 117 |
+
| reward_mean | 1.00 β0 |
|
| 118 |
+
| reward_std | 0.00 β0 |
|
| 119 |
+
| time_elapsed | 3.55 β0.58 |
|
| 120 |
+
-----------------------------------------------
|
| 121 |
+
-----------------------------------------------
|
| 122 |
+
| train/ | |
|
| 123 |
+
| ep_rew_mean | 63.52 β17.74 |
|
| 124 |
+
| ep_len_mean | 63.00 β18.00 |
|
| 125 |
+
| epoch | 38 β10 |
|
| 126 |
+
| total_timesteps | 10240 β2560.00 |
|
| 127 |
+
| total_episodes | 241 β24 |
|
| 128 |
+
| total_rollouts | 40.00 β10.00 |
|
| 129 |
+
| rollout_timesteps | 256 β0 |
|
| 130 |
+
| rollout_episodes | 2.00 β1.00 |
|
| 131 |
+
| fps_instant | 4253 β913 |
|
| 132 |
+
| rollout_fps | 22551.70 β49.80 |
|
| 133 |
+
| loss | 1.34 β5.19 |
|
| 134 |
+
| policy_loss | 0.0110 β0.0055 |
|
| 135 |
+
| value_loss | 2.6502 β10.3863 |
|
| 136 |
+
| entropy_loss | -0.6145 β0.0118 |
|
| 137 |
+
| action_mean | 0.50 β0.0025 |
|
| 138 |
+
| action_std | 0.50 β1.48e-05 |
|
| 139 |
+
| approx_kl | 0.0034 β0.0012 |
|
| 140 |
+
| baseline_mean | 0.00 β0 |
|
| 141 |
+
| baseline_std | 0.00 β0 |
|
| 142 |
+
| clip_fraction | 0.042 β0.012 |
|
| 143 |
+
| clip_range | 0.1800 β0.0051 |
|
| 144 |
+
| entropy | 0.6145 β0.0118 |
|
| 145 |
+
| explained_variance | 0.962 β1.385 |
|
| 146 |
+
| fps | 2449 β288 |
|
| 147 |
+
| kl_div | 0.0083 β0.0018 |
|
| 148 |
+
| learning_rate | 0.000900 β0.000026 |
|
| 149 |
+
| obs_mean | 0.06 β0.04 |
|
| 150 |
+
| obs_std | 0.51 β0.04 |
|
| 151 |
+
| reward_mean | 1.00 β0 |
|
| 152 |
+
| reward_std | 0.00 β0 |
|
| 153 |
+
| time_elapsed | 4.18 β0.63 |
|
| 154 |
+
-----------------------------------------------
|
| 155 |
+
β οΈ ALGORITHM WARNING: High clip fraction (0.144) indicates policy is changing too rapidly. Consider reducing learning rate or clip_range.
|
| 156 |
+
-----------------------------------------------
|
| 157 |
+
| train/ | |
|
| 158 |
+
| ep_rew_mean | 80.62 β17.10 |
|
| 159 |
+
| ep_len_mean | 80.00 β17.00 |
|
| 160 |
+
| epoch | 48 β10 |
|
| 161 |
+
| total_timesteps | 12800 β2560.00 |
|
| 162 |
+
| total_episodes | 265 β24 |
|
| 163 |
+
| total_rollouts | 50.00 β10.00 |
|
| 164 |
+
| rollout_timesteps | 256 β0 |
|
| 165 |
+
| rollout_episodes | 2.00 β0 |
|
| 166 |
+
| fps_instant | 4129 β123 |
|
| 167 |
+
| rollout_fps | 22596.84 β45.14 |
|
| 168 |
+
| loss | 1.53 β0.19 |
|
| 169 |
+
| policy_loss | 0.0126 β0.0015 |
|
| 170 |
+
| value_loss | 3.0301 β0.3799 |
|
| 171 |
+
| entropy_loss | -0.5850 β0.0294 |
|
| 172 |
+
| action_mean | 0.51 β0.0027 |
|
| 173 |
+
| action_std | 0.50 β2.96e-05 |
|
| 174 |
+
| approx_kl | 0.0081 β0.0047 |
|
| 175 |
+
| baseline_mean | 0.00 β0 |
|
| 176 |
+
| baseline_std | 0.00 β0 |
|
| 177 |
+
| clip_fraction | 0.144 β0.102 |
|
| 178 |
+
| clip_range | 0.1749 β0.0051 |
|
| 179 |
+
| entropy | 0.5850 β0.0294 |
|
| 180 |
+
| explained_variance | 0.963 β0.001 |
|
| 181 |
+
| fps | 2669 β220 |
|
| 182 |
+
| kl_div | 0.0067 β0.0016 |
|
| 183 |
+
| learning_rate | 0.000875 β0.000026 |
|
| 184 |
+
| obs_mean | 0.09 β0.03 |
|
| 185 |
+
| obs_std | 0.54 β0.04 |
|
| 186 |
+
| reward_mean | 1.00 β0 |
|
| 187 |
+
| reward_std | 0.00 β0 |
|
| 188 |
+
| time_elapsed | 4.80 β0.61 |
|
| 189 |
+
-----------------------------------------------
|
| 190 |
+
β οΈ ALGORITHM WARNING: High clip fraction (0.186) indicates policy is changing too rapidly. Consider reducing learning rate or clip_range.
|
| 191 |
+
-----------------------------------------------
|
| 192 |
+
| train/ | |
|
| 193 |
+
| ep_rew_mean | 95.26 β14.64 |
|
| 194 |
+
| ep_len_mean | 95.00 β15.00 |
|
| 195 |
+
| epoch | 58 β10 |
|
| 196 |
+
| total_timesteps | 15360 β2560.00 |
|
| 197 |
+
| total_episodes | 293 β28 |
|
| 198 |
+
| total_rollouts | 60.00 β10.00 |
|
| 199 |
+
| rollout_timesteps | 256 β0 |
|
| 200 |
+
| rollout_episodes | 6.00 β4.00 |
|
| 201 |
+
| fps_instant | 4169 β40 |
|
| 202 |
+
| rollout_fps | 22267.97 β328.87 |
|
| 203 |
+
| loss | 18.56 β17.03 |
|
| 204 |
+
| policy_loss | 0.0049 β0.0077 |
|
| 205 |
+
| value_loss | 37.1057 β34.0756 |
|
| 206 |
+
| entropy_loss | -0.5526 β0.0325 |
|
| 207 |
+
| action_mean | 0.51 β0.0018 |
|
| 208 |
+
| action_std | 0.50 β2.88e-05 |
|
| 209 |
+
| approx_kl | 0.0188 β0.0106 |
|
| 210 |
+
| baseline_mean | 0.00 β0 |
|
| 211 |
+
| baseline_std | 0.00 β0 |
|
| 212 |
+
| clip_fraction | 0.186 β0.042 |
|
| 213 |
+
| clip_range | 0.1698 β0.0051 |
|
| 214 |
+
| entropy | 0.5526 β0.0325 |
|
| 215 |
+
| explained_variance | 0.649 β0.313 |
|
| 216 |
+
| fps | 2828 β159 |
|
| 217 |
+
| kl_div | 0.0115 β0.0048 |
|
| 218 |
+
| learning_rate | 0.000849 β0.000026 |
|
| 219 |
+
| obs_mean | 0.12 β0.03 |
|
| 220 |
+
| obs_std | 0.56 β0.01 |
|
| 221 |
+
| reward_mean | 1.00 β0 |
|
| 222 |
+
| reward_std | 0.00 β0 |
|
| 223 |
+
| time_elapsed | 5.43 β0.64 |
|
| 224 |
+
-----------------------------------------------
|
| 225 |
+
β οΈ ALGORITHM WARNING: High clip fraction (0.340) indicates policy is changing too rapidly. Consider reducing learning rate or clip_range.
|
| 226 |
+
-----------------------------------------------
|
| 227 |
+
| train/ | |
|
| 228 |
+
| ep_rew_mean | 104.72 β9.46 |
|
| 229 |
+
| ep_len_mean | 104.00 β9.00 |
|
| 230 |
+
| epoch | 68 β10 |
|
| 231 |
+
| total_timesteps | 17920 β2560.00 |
|
| 232 |
+
| total_episodes | 314 β21 |
|
| 233 |
+
| total_rollouts | 70.00 β10.00 |
|
| 234 |
+
| rollout_timesteps | 256 β0 |
|
| 235 |
+
| rollout_episodes | 2.00 β4.00 |
|
| 236 |
+
| fps_instant | 3896 β273 |
|
| 237 |
+
| rollout_fps | 22153.08 β114.89 |
|
| 238 |
+
| loss | 4.85 β13.70 |
|
| 239 |
+
| policy_loss | 0.0009 β0.0040 |
|
| 240 |
+
| value_loss | 9.7081 β27.3976 |
|
| 241 |
+
| entropy_loss | -0.5655 β0.0129 |
|
| 242 |
+
| action_mean | 0.51 β0.0024 |
|
| 243 |
+
| action_std | 0.50 β4.85e-05 |
|
| 244 |
+
| approx_kl | 0.0190 β0.0002 |
|
| 245 |
+
| baseline_mean | 0.00 β0 |
|
| 246 |
+
| baseline_std | 0.00 β0 |
|
| 247 |
+
| clip_fraction | 0.340 β0.154 |
|
| 248 |
+
| clip_range | 0.1647 β0.0051 |
|
| 249 |
+
| entropy | 0.5655 β0.0129 |
|
| 250 |
+
| explained_variance | 0.775 β0.126 |
|
| 251 |
+
| fps | 2954 β125 |
|
| 252 |
+
| kl_div | 0.0135 β0.0020 |
|
| 253 |
+
| learning_rate | 0.000823 β0.000026 |
|
| 254 |
+
| obs_mean | 0.13 β0.01 |
|
| 255 |
+
| obs_std | 0.57 β0.02 |
|
| 256 |
+
| reward_mean | 1.00 β0 |
|
| 257 |
+
| reward_std | 0.00 β0 |
|
| 258 |
+
| time_elapsed | 6.07 β0.64 |
|
| 259 |
+
-----------------------------------------------
|
| 260 |
+
β οΈ ALGORITHM WARNING: High approximate KL divergence (0.1479) indicates large policy changes. Consider reducing learning rate.
|
| 261 |
+
β οΈ ALGORITHM WARNING: High KL divergence (0.1581) indicates large policy changes. Consider reducing learning rate.
|
| 262 |
+
β οΈ ALGORITHM WARNING: High clip fraction (0.492) indicates policy is changing too rapidly. Consider reducing learning rate or clip_range.
|
| 263 |
+
-----------------------------------------------
|
| 264 |
+
| train/ | |
|
| 265 |
+
| ep_rew_mean | 97.52 β7.20 |
|
| 266 |
+
| ep_len_mean | 97.00 β7.00 |
|
| 267 |
+
| epoch | 78 β10 |
|
| 268 |
+
| total_timesteps | 20480 β2560.00 |
|
| 269 |
+
| total_episodes | 347 β33 |
|
| 270 |
+
| total_rollouts | 80.00 β10.00 |
|
| 271 |
+
| rollout_timesteps | 256 β0 |
|
| 272 |
+
| rollout_episodes | 7.00 β5.00 |
|
| 273 |
+
| fps_instant | 6218 β2322.00 |
|
| 274 |
+
| rollout_fps | 22400.17 β247.09 |
|
| 275 |
+
| loss | 30.81 β25.95 |
|
| 276 |
+
| policy_loss | 0.0083 β0.0074 |
|
| 277 |
+
| value_loss | 61.5977 β51.8896 |
|
| 278 |
+
| entropy_loss | -0.3653 β0.2001 |
|
| 279 |
+
| action_mean | 0.51 β0.0016 |
|
| 280 |
+
| action_std | 0.50 β3.78e-05 |
|
| 281 |
+
| approx_kl | 0.1479 β0.1289 |
|
| 282 |
+
| baseline_mean | 0.00 β0 |
|
| 283 |
+
| baseline_std | 0.00 β0 |
|
| 284 |
+
| clip_fraction | 0.492 β0.152 |
|
| 285 |
+
| clip_range | 0.1596 β0.0051 |
|
| 286 |
+
| entropy | 0.3653 β0.2001 |
|
| 287 |
+
| explained_variance | 0.198 β0.578 |
|
| 288 |
+
| fps | 3065 β111 |
|
| 289 |
+
| kl_div | 0.1581 β0.1446 |
|
| 290 |
+
| learning_rate | 0.000798 β0.000026 |
|
| 291 |
+
| obs_mean | 0.14 β0.01 |
|
| 292 |
+
| obs_std | 0.59 β0.02 |
|
| 293 |
+
| reward_mean | 1.00 β0 |
|
| 294 |
+
| reward_std | 0.00 β0 |
|
| 295 |
+
| time_elapsed | 6.68 β0.62 |
|
| 296 |
+
-----------------------------------------------
|
| 297 |
+
β οΈ ALGORITHM WARNING: High clip fraction (0.182) indicates policy is changing too rapidly. Consider reducing learning rate or clip_range.
|
| 298 |
+
-----------------------------------------------
|
| 299 |
+
| train/ | |
|
| 300 |
+
| ep_rew_mean | 96.61 β0.91 |
|
| 301 |
+
| ep_len_mean | 96.00 β1.00 |
|
| 302 |
+
| epoch | 88 β10 |
|
| 303 |
+
| total_timesteps | 23040 β2560.00 |
|
| 304 |
+
| total_episodes | 365 β18 |
|
| 305 |
+
| total_rollouts | 90.00 β10.00 |
|
| 306 |
+
| rollout_timesteps | 256 β0 |
|
| 307 |
+
| rollout_episodes | 1.00 β6.00 |
|
| 308 |
+
| fps_instant | 4889 β1329.00 |
|
| 309 |
+
| rollout_fps | 22974.76 β574.60 |
|
| 310 |
+
| loss | 3.98 β26.83 |
|
| 311 |
+
| policy_loss | 0.0053 β0.0030 |
|
| 312 |
+
| value_loss | 7.9501 β53.6476 |
|
| 313 |
+
| entropy_loss | -0.5333 β0.1680 |
|
| 314 |
+
| action_mean | 0.51 β0.0010 |
|
| 315 |
+
| action_std | 0.50 β2.69e-05 |
|
| 316 |
+
| approx_kl | 0.0189 β0.1289 |
|
| 317 |
+
| baseline_mean | 0.00 β0 |
|
| 318 |
+
| baseline_std | 0.00 β0 |
|
| 319 |
+
| clip_fraction | 0.182 β0.310 |
|
| 320 |
+
| clip_range | 0.1544 β0.0051 |
|
| 321 |
+
| entropy | 0.5333 β0.1680 |
|
| 322 |
+
| explained_variance | 0.924 β0.726 |
|
| 323 |
+
| fps | 3186 β121 |
|
| 324 |
+
| kl_div | 0.0245 β0.1336 |
|
| 325 |
+
| learning_rate | 0.000772 β0.000026 |
|
| 326 |
+
| obs_mean | 0.13 β0.0021 |
|
| 327 |
+
| obs_std | 0.60 β0.01 |
|
| 328 |
+
| reward_mean | 1.00 β0 |
|
| 329 |
+
| reward_std | 0.00 β0 |
|
| 330 |
+
| time_elapsed | 7.23 β0.55 |
|
| 331 |
+
-----------------------------------------------
|
| 332 |
+
β οΈ ALGORITHM WARNING: High clip fraction (0.183) indicates policy is changing too rapidly. Consider reducing learning rate or clip_range.
|
| 333 |
+
-----------------------------------------------
|
| 334 |
+
| train/ | |
|
| 335 |
+
| ep_rew_mean | 111.95 β15.34 |
|
| 336 |
+
| ep_len_mean | 111.00 β15.00 |
|
| 337 |
+
| epoch | 98 β10 |
|
| 338 |
+
| total_timesteps | 25600 β2560.00 |
|
| 339 |
+
| total_episodes | 382 β17 |
|
| 340 |
+
| total_rollouts | 100.00 β10.00 |
|
| 341 |
+
| rollout_timesteps | 256 β0 |
|
| 342 |
+
| rollout_episodes | 2.00 β1.00 |
|
| 343 |
+
| fps_instant | 4399 β490 |
|
| 344 |
+
| rollout_fps | 23067.73 β92.96 |
|
| 345 |
+
| loss | 4.33 β0.35 |
|
| 346 |
+
| policy_loss | -0.0061 β0.0114 |
|
| 347 |
+
| value_loss | 8.6635 β0.7134 |
|
| 348 |
+
| entropy_loss | -0.5509 β0.0177 |
|
| 349 |
+
| action_mean | 0.51 β0.0007 |
|
| 350 |
+
| action_std | 0.50 β1.96e-05 |
|
| 351 |
+
| approx_kl | 0.0123 β0.0066 |
|
| 352 |
+
| baseline_mean | 0.00 β0 |
|
| 353 |
+
| baseline_std | 0.00 β0 |
|
| 354 |
+
| clip_fraction | 0.183 β0.001 |
|
| 355 |
+
| clip_range | 0.1493 β0.0051 |
|
| 356 |
+
| entropy | 0.5509 β0.0177 |
|
| 357 |
+
| explained_variance | 0.936 β0.012 |
|
| 358 |
+
| fps | 3268 β83 |
|
| 359 |
+
| kl_div | 0.0209 β0.0036 |
|
| 360 |
+
| learning_rate | 0.000747 β0.000026 |
|
| 361 |
+
| obs_mean | 0.12 β0.01 |
|
| 362 |
+
| obs_std | 0.62 β0.02 |
|
| 363 |
+
| reward_mean | 1.00 β0 |
|
| 364 |
+
| reward_std | 0.00 β0 |
|
| 365 |
+
| time_elapsed | 7.83 β0.60 |
|
| 366 |
+
-----------------------------------------------
|
| 367 |
+
/home/tsilva/repos/tsilva/gymnasium-solver/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:425: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
|
| 368 |
+
β οΈ ALGORITHM WARNING: High clip fraction (0.282) indicates policy is changing too rapidly. Consider reducing learning rate or clip_range.
|
| 369 |
+
-----------------------------------------------
|
| 370 |
+
| train/ | |
|
| 371 |
+
| ep_rew_mean | 111.95 β0 |
|
| 372 |
+
| ep_len_mean | 111.00 β0 |
|
| 373 |
+
| epoch | 98 β0 |
|
| 374 |
+
| total_timesteps | 25600 β0 |
|
| 375 |
+
| total_episodes | 382 β0 |
|
| 376 |
+
| total_rollouts | 100.00 β0 |
|
| 377 |
+
| rollout_timesteps | 256 β0 |
|
| 378 |
+
| rollout_episodes | 2.00 β0 |
|
| 379 |
+
| fps_instant | 4399 β0 |
|
| 380 |
+
| rollout_fps | 23067.73 β0 |
|
| 381 |
+
| loss | 0.28 β4.05 |
|
| 382 |
+
| policy_loss | 0.0020 β0.0081 |
|
| 383 |
+
| value_loss | 0.5542 β8.1094 |
|
| 384 |
+
| entropy_loss | -0.5920 β0.0410 |
|
| 385 |
+
| action_mean | 0.51 β0 |
|
| 386 |
+
| action_std | 0.50 β0 |
|
| 387 |
+
| approx_kl | 0.0096 β0.0027 |
|
| 388 |
+
| baseline_mean | 0.00 β0 |
|
| 389 |
+
| baseline_std | 0.00 β0 |
|
| 390 |
+
| clip_fraction | 0.282 β0.099 |
|
| 391 |
+
| clip_range | 0.1488 β0.0005 |
|
| 392 |
+
| entropy | 0.5920 β0.0410 |
|
| 393 |
+
| explained_variance | 0.990 β0.054 |
|
| 394 |
+
| fps | 3268 β0 |
|
| 395 |
+
| kl_div | 0.0158 β0.0050 |
|
| 396 |
+
| learning_rate | 0.000744 β0.000003 |
|
| 397 |
+
| obs_mean | 0.12 β0 |
|
| 398 |
+
| obs_std | 0.62 β0 |
|
| 399 |
+
| reward_mean | 1.00 β0 |
|
| 400 |
+
| reward_std | 0.00 β0 |
|
| 401 |
+
| time_elapsed | 7.83 β0 |
|
| 402 |
+
| eval/ | |
|
| 403 |
+
| ep_rew_mean | 272.40 |
|
| 404 |
+
| ep_len_mean | 272.40 |
|
| 405 |
+
| epoch | 99 |
|
| 406 |
+
| total_timesteps | 6040 |
|
| 407 |
+
| total_episodes | 10 |
|
| 408 |
+
| epoch_fps | 3127.00 |
|
| 409 |
+
-----------------------------------------------
|
| 410 |
+
New best model saved with eval/ep_rew_mean=272.4000
|
| 411 |
+
Timestamped: runs/cvb5lyfw/checkpoints/epoch=99-step=2000.ckpt
|
| 412 |
+
Best: runs/cvb5lyfw/checkpoints/best_checkpoint.ckpt
|
| 413 |
+
Using environment spec reward_threshold: 475.0
|
| 414 |
+
-----------------------------------------------
|
| 415 |
+
| train/ | |
|
| 416 |
+
| ep_rew_mean | 113.88 β1.93 |
|
| 417 |
+
| ep_len_mean | 113.00 β2.00 |
|
| 418 |
+
| epoch | 108 β10 |
|
| 419 |
+
| total_timesteps | 28160 β2560.00 |
|
| 420 |
+
| total_episodes | 403 β21 |
|
| 421 |
+
| total_rollouts | 110.00 β10.00 |
|
| 422 |
+
| rollout_timesteps | 256 β0 |
|
| 423 |
+
| rollout_episodes | 1.00 β1.00 |
|
| 424 |
+
| fps_instant | 4467 β69 |
|
| 425 |
+
| rollout_fps | 23455.68 β387.96 |
|
| 426 |
+
| loss | 0.71 β0.43 |
|
| 427 |
+
| policy_loss | 0.0094 β0.0074 |
|
| 428 |
+
| value_loss | 1.3914 β0.8372 |
|
| 429 |
+
| entropy_loss | -0.5770 β0.0149 |
|
| 430 |
+
| action_mean | 0.52 β0.0020 |
|
| 431 |
+
| action_std | 0.50 β0.0001 |
|
| 432 |
+
| approx_kl | 0.0038 β0.0057 |
|
| 433 |
+
| baseline_mean | 0.00 β0 |
|
| 434 |
+
| baseline_std | 0.00 β0 |
|
| 435 |
+
| clip_fraction | 0.088 β0.195 |
|
| 436 |
+
| clip_range | 0.1442 β0.0046 |
|
| 437 |
+
| entropy | 0.5770 β0.0149 |
|
| 438 |
+
| explained_variance | 0.985 β0.005 |
|
| 439 |
+
| fps | 2721 β548 |
|
| 440 |
+
| kl_div | 0.0016 β0.0142 |
|
| 441 |
+
| learning_rate | 0.000721 β0.000023 |
|
| 442 |
+
| obs_mean | 0.14 β0.02 |
|
| 443 |
+
| obs_std | 0.62 β0.0041 |
|
| 444 |
+
| reward_mean | 1.00 β0 |
|
| 445 |
+
| reward_std | 0.00 β0 |
|
| 446 |
+
| time_elapsed | 10.35 β2.52 |
|
| 447 |
+
| eval/ | |
|
| 448 |
+
| ep_rew_mean | 272.40 β0 |
|
| 449 |
+
| ep_len_mean | 272.40 β0 |
|
| 450 |
+
| epoch | 99 β0 |
|
| 451 |
+
| total_timesteps | 6040 β0 |
|
| 452 |
+
| total_episodes | 10 β0 |
|
| 453 |
+
| epoch_fps | 3127.00 β0 |
|
| 454 |
+
-----------------------------------------------
|
| 455 |
+
β οΈ ALGORITHM WARNING: High clip fraction (0.343) indicates policy is changing too rapidly. Consider reducing learning rate or clip_range.
|
| 456 |
+
-----------------------------------------------
|
| 457 |
+
| train/ | |
|
| 458 |
+
| ep_rew_mean | 121.08 β7.20 |
|
| 459 |
+
| ep_len_mean | 121.00 β8.00 |
|
| 460 |
+
| epoch | 118 β10 |
|
| 461 |
+
| total_timesteps | 30720 β2560.00 |
|
| 462 |
+
| total_episodes | 415 β12 |
|
| 463 |
+
| total_rollouts | 120.00 β10.00 |
|
| 464 |
+
| rollout_timesteps | 256 β0 |
|
| 465 |
+
| rollout_episodes | 0.00 β1.00 |
|
| 466 |
+
| fps_instant | 5078 β611 |
|
| 467 |
+
| rollout_fps | 23675.15 β219.47 |
|
| 468 |
+
| loss | 0.17 β0.53 |
|
| 469 |
+
| policy_loss | -0.0080 β0.0173 |
|
| 470 |
+
| value_loss | 0.3605 β1.0309 |
|
| 471 |
+
| entropy_loss | -0.5659 β0.0111 |
|
| 472 |
+
| action_mean | 0.52 β0.0002 |
|
| 473 |
+
| action_std | 0.50 β6.38e-06 |
|
| 474 |
+
| approx_kl | 0.0105 β0.0067 |
|
| 475 |
+
| baseline_mean | 0.00 β0 |
|
| 476 |
+
| baseline_std | 0.00 β0 |
|
| 477 |
+
| clip_fraction | 0.343 β0.255 |
|
| 478 |
+
| clip_range | 0.1391 β0.0051 |
|
| 479 |
+
| entropy | 0.5659 β0.0111 |
|
| 480 |
+
| explained_variance | 0.995 β0.011 |
|
| 481 |
+
| fps | 2805 β84 |
|
| 482 |
+
| kl_div | 0.0180 β0.0164 |
|
| 483 |
+
| learning_rate | 0.000695 β0.000026 |
|
| 484 |
+
| obs_mean | 0.14 β0.0037 |
|
| 485 |
+
| obs_std | 0.62 β0.0022 |
|
| 486 |
+
| reward_mean | 1.00 β0 |
|
| 487 |
+
| reward_std | 0.00 β0 |
|
| 488 |
+
| time_elapsed | 10.95 β0.60 |
|
| 489 |
+
| eval/ | |
|
| 490 |
+
| ep_rew_mean | 272.40 β0 |
|
| 491 |
+
| ep_len_mean | 272.40 β0 |
|
| 492 |
+
| epoch | 99 β0 |
|
| 493 |
+
| total_timesteps | 6040 β0 |
|
| 494 |
+
| total_episodes | 10 β0 |
|
| 495 |
+
| epoch_fps | 3127.00 β0 |
|
| 496 |
+
-----------------------------------------------
|
| 497 |
+
β οΈ ALGORITHM WARNING: High clip fraction (0.119) indicates policy is changing too rapidly. Consider reducing learning rate or clip_range.
|
| 498 |
+
-----------------------------------------------
|
| 499 |
+
| train/ | |
|
| 500 |
+
| ep_rew_mean | 131.68 β10.60 |
|
| 501 |
+
| ep_len_mean | 131.00 β10.00 |
|
| 502 |
+
| epoch | 128 β10 |
|
| 503 |
+
| total_timesteps | 33280 β2560.00 |
|
| 504 |
+
| total_episodes | 430 β15 |
|
| 505 |
+
| total_rollouts | 130.00 β10.00 |
|
| 506 |
+
| rollout_timesteps | 256 β0 |
|
| 507 |
+
| rollout_episodes | 3.00 β3.00 |
|
| 508 |
+
| fps_instant | 4648 β431 |
|
| 509 |
+
| rollout_fps | 23648.98 β26.17 |
|
| 510 |
+
| loss | 24.81 β24.64 |
|
| 511 |
+
| policy_loss | 0.0002 β0.0082 |
|
| 512 |
+
| value_loss | 49.6250 β49.2645 |
|
| 513 |
+
| entropy_loss | -0.5436 β0.0223 |
|
| 514 |
+
| action_mean | 0.52 β0.0006 |
|
| 515 |
+
| action_std | 0.50 β1.93e-05 |
|
| 516 |
+
| approx_kl | 0.0064 β0.0041 |
|
| 517 |
+
| baseline_mean | 0.00 β0 |
|
| 518 |
+
| baseline_std | 0.00 β0 |
|
| 519 |
+
| clip_fraction | 0.119 β0.224 |
|
| 520 |
+
| clip_range | 0.1340 β0.0051 |
|
| 521 |
+
| entropy | 0.5436 β0.0223 |
|
| 522 |
+
| explained_variance | 0.438 β0.557 |
|
| 523 |
+
| fps | 2889 β85 |
|
| 524 |
+
| kl_div | 0.0107 β0.0073 |
|
| 525 |
+
| learning_rate | 0.000670 β0.000026 |
|
| 526 |
+
| obs_mean | 0.14 β0.01 |
|
| 527 |
+
| obs_std | 0.62 β0.0026 |
|
| 528 |
+
| reward_mean | 1.00 β0 |
|
| 529 |
+
| reward_std | 0.00 β0 |
|
| 530 |
+
| time_elapsed | 11.52 β0.56 |
|
| 531 |
+
| eval/ | |
|
| 532 |
+
| ep_rew_mean | 272.40 β0 |
|
| 533 |
+
| ep_len_mean | 272.40 β0 |
|
| 534 |
+
| epoch | 99 β0 |
|
| 535 |
+
| total_timesteps | 6040 β0 |
|
| 536 |
+
| total_episodes | 10 β0 |
|
| 537 |
+
| epoch_fps | 3127.00 β0 |
|
| 538 |
+
-----------------------------------------------
|
| 539 |
+
β οΈ ALGORITHM WARNING: High clip fraction (0.524) indicates policy is changing too rapidly. Consider reducing learning rate or clip_range.
|
| 540 |
+
-----------------------------------------------
|
| 541 |
+
| train/ | |
|
| 542 |
+
| ep_rew_mean | 139.29 β7.61 |
|
| 543 |
+
| ep_len_mean | 139.00 β8.00 |
|
| 544 |
+
| epoch | 138 β10 |
|
| 545 |
+
| total_timesteps | 35840 β2560.00 |
|
| 546 |
+
| total_episodes | 435 β5 |
|
| 547 |
+
| total_rollouts | 140.00 β10.00 |
|
| 548 |
+
| rollout_timesteps | 256 β0 |
|
| 549 |
+
| rollout_episodes | 0.00 β3.00 |
|
| 550 |
+
| fps_instant | 4137 β510 |
|
| 551 |
+
| rollout_fps | 23805.51 β156.53 |
|
| 552 |
+
| loss | -0.01 β24.82 |
|
| 553 |
+
| policy_loss | -0.0318 β0.0320 |
|
| 554 |
+
| value_loss | 0.0524 β49.5726 |
|
| 555 |
+
| entropy_loss | -0.5789 β0.0353 |
|
| 556 |
+
| action_mean | 0.52 β0.0007 |
|
| 557 |
+
| action_std | 0.50 β2.20e-05 |
|
| 558 |
+
| approx_kl | 0.0180 β0.0116 |
|
| 559 |
+
| baseline_mean | 0.00 β0 |
|
| 560 |
+
| baseline_std | 0.00 β0 |
|
| 561 |
+
| clip_fraction | 0.524 β0.405 |
|
| 562 |
+
| clip_range | 0.1288 β0.0051 |
|
| 563 |
+
| entropy | 0.5789 β0.0353 |
|
| 564 |
+
| explained_variance | 0.138 β0.300 |
|
| 565 |
+
| fps | 2957 β67 |
|
| 566 |
+
| kl_div | 0.0276 β0.0169 |
|
| 567 |
+
| learning_rate | 0.000644 β0.000026 |
|
| 568 |
+
| obs_mean | 0.12 β0.01 |
|
| 569 |
+
| obs_std | 0.62 β0.0024 |
|
| 570 |
+
| reward_mean | 1.00 β0 |
|
| 571 |
+
| reward_std | 0.00 β0 |
|
| 572 |
+
| time_elapsed | 12.12 β0.60 |
|
| 573 |
+
| eval/ | |
|
| 574 |
+
| ep_rew_mean | 272.40 β0 |
|
| 575 |
+
| ep_len_mean | 272.40 β0 |
|
| 576 |
+
| epoch | 99 β0 |
|
| 577 |
+
| total_timesteps | 6040 β0 |
|
| 578 |
+
| total_episodes | 10 β0 |
|
| 579 |
+
| epoch_fps | 3127.00 β0 |
|
| 580 |
+
-----------------------------------------------
|
| 581 |
+
-----------------------------------------------
|
| 582 |
+
| train/ | |
|
| 583 |
+
| ep_rew_mean | 162.72 β23.43 |
|
| 584 |
+
| ep_len_mean | 162.00 β23.00 |
|
| 585 |
+
| epoch | 148 β10 |
|
| 586 |
+
| total_timesteps | 38400 β2560.00 |
|
| 587 |
+
| total_episodes | 440 β5 |
|
| 588 |
+
| total_rollouts | 150.00 β10.00 |
|
| 589 |
+
| rollout_timesteps | 256 β0 |
|
| 590 |
+
| rollout_episodes | 0.00 β0 |
|
| 591 |
+
| fps_instant | 4708 β570 |
|
| 592 |
+
| rollout_fps | 24173.95 β368.44 |
|
| 593 |
+
| loss | 0.01 β0.01 |
|
| 594 |
+
| policy_loss | -0.0012 β0.0306 |
|
| 595 |
+
| value_loss | 0.0134 β0.0390 |
|
| 596 |
+
| entropy_loss | -0.5216 β0.0573 |
|
| 597 |
+
| action_mean | 0.51 β0.0010 |
|
| 598 |
+
| action_std | 0.50 β2.89e-05 |
|
| 599 |
+
| approx_kl | 0.0011 β0.0170 |
|
| 600 |
+
| baseline_mean | 0.00 β0 |
|
| 601 |
+
| baseline_std | 0.00 β0 |
|
| 602 |
+
| clip_fraction | 0.029 β0.494 |
|
| 603 |
+
| clip_range | 0.1237 β0.0051 |
|
| 604 |
+
| entropy | 0.5216 β0.0573 |
|
| 605 |
+
| explained_variance | 0.937 β0.800 |
|
| 606 |
+
| fps | 3026 β70 |
|
| 607 |
+
| kl_div | 0.0038 β0.0238 |
|
| 608 |
+
| learning_rate | 0.000619 β0.000026 |
|
| 609 |
+
| obs_mean | 0.09 β0.03 |
|
| 610 |
+
| obs_std | 0.63 β0.02 |
|
| 611 |
+
| reward_mean | 1.00 β0 |
|
| 612 |
+
| reward_std | 0.00 β0 |
|
| 613 |
+
| time_elapsed | 12.69 β0.57 |
|
| 614 |
+
| eval/ | |
|
| 615 |
+
| ep_rew_mean | 272.40 β0 |
|
| 616 |
+
| ep_len_mean | 272.40 β0 |
|
| 617 |
+
| epoch | 99 β0 |
|
| 618 |
+
| total_timesteps | 6040 β0 |
|
| 619 |
+
| total_episodes | 10 β0 |
|
| 620 |
+
| epoch_fps | 3127.00 β0 |
|
| 621 |
+
-----------------------------------------------
|
| 622 |
+
-----------------------------------------------
|
| 623 |
+
| train/ | |
|
| 624 |
+
| ep_rew_mean | 186.74 β24.02 |
|
| 625 |
+
| ep_len_mean | 186.00 β24.00 |
|
| 626 |
+
| epoch | 158 β10 |
|
| 627 |
+
| total_timesteps | 40960 β2560.00 |
|
| 628 |
+
| total_episodes | 445 β5 |
|
| 629 |
+
| total_rollouts | 160.00 β10.00 |
|
| 630 |
+
| rollout_timesteps | 256 β0 |
|
| 631 |
+
| rollout_episodes | 0.00 β0 |
|
| 632 |
+
| fps_instant | 5483 β775 |
|
| 633 |
+
| rollout_fps | 24476.36 β302.41 |
|
| 634 |
+
| loss | -0.00 β0.01 |
|
| 635 |
+
| policy_loss | -0.0021 β0.0009 |
|
| 636 |
+
| value_loss | 0.0021 β0.0113 |
|
| 637 |
+
| entropy_loss | -0.5528 β0.0311 |
|
| 638 |
+
| action_mean | 0.51 β0.0007 |
|
| 639 |
+
| action_std | 0.50 β1.97e-05 |
|
| 640 |
+
| approx_kl | 0.0013 β0.0002 |
|
| 641 |
+
| baseline_mean | 0.00 β0 |
|
| 642 |
+
| baseline_std | 0.00 β0 |
|
| 643 |
+
| clip_fraction | 0.041 β0.012 |
|
| 644 |
+
| clip_range | 0.1186 β0.0051 |
|
| 645 |
+
| entropy | 0.5528 β0.0311 |
|
| 646 |
+
| explained_variance | 0.832 β0.106 |
|
| 647 |
+
| fps | 3082 β55 |
|
| 648 |
+
| kl_div | -0.0002 β0.0040 |
|
| 649 |
+
| learning_rate | 0.000593 β0.000026 |
|
| 650 |
+
| obs_mean | 0.07 β0.03 |
|
| 651 |
+
| obs_std | 0.64 β0.01 |
|
| 652 |
+
| reward_mean | 1.00 β0 |
|
| 653 |
+
| reward_std | 0.00 β0 |
|
| 654 |
+
| time_elapsed | 13.29 β0.60 |
|
| 655 |
+
| eval/ | |
|
| 656 |
+
| ep_rew_mean | 272.40 β0 |
|
| 657 |
+
| ep_len_mean | 272.40 β0 |
|
| 658 |
+
| epoch | 99 β0 |
|
| 659 |
+
| total_timesteps | 6040 β0 |
|
| 660 |
+
| total_episodes | 10 β0 |
|
| 661 |
+
| epoch_fps | 3127.00 β0 |
|
| 662 |
+
-----------------------------------------------
|
| 663 |
+
β οΈ ALGORITHM WARNING: High clip fraction (0.130) indicates policy is changing too rapidly. Consider reducing learning rate or clip_range.
|
| 664 |
+
-----------------------------------------------
|
| 665 |
+
| train/ | |
|
| 666 |
+
| ep_rew_mean | 209.88 β23.14 |
|
| 667 |
+
| ep_len_mean | 209.00 β23.00 |
|
| 668 |
+
| epoch | 168 β10 |
|
| 669 |
+
| total_timesteps | 43520 β2560.00 |
|
| 670 |
+
| total_episodes | 451 β6 |
|
| 671 |
+
| total_rollouts | 170.00 β10.00 |
|
| 672 |
+
| rollout_timesteps | 256 β0 |
|
| 673 |
+
| rollout_episodes | 1.00 β1.00 |
|
| 674 |
+
| fps_instant | 4587 β896 |
|
| 675 |
+
| rollout_fps | 24439.38 β36.98 |
|
| 676 |
+
| loss | -0.01 β0.01 |
|
| 677 |
+
| policy_loss | -0.0078 β0.0056 |
|
| 678 |
+
| value_loss | 0.0027 β0.0007 |
|
| 679 |
+
| entropy_loss | -0.5290 β0.0237 |
|
| 680 |
+
| action_mean | 0.51 β0.0009 |
|
| 681 |
+
| action_std | 0.50 β2.43e-05 |
|
| 682 |
+
| approx_kl | 0.0028 β0.0015 |
|
| 683 |
+
| baseline_mean | 0.00 β0 |
|
| 684 |
+
| baseline_std | 0.00 β0 |
|
| 685 |
+
| clip_fraction | 0.130 β0.089 |
|
| 686 |
+
| clip_range | 0.1135 β0.0051 |
|
| 687 |
+
| entropy | 0.5290 β0.0237 |
|
| 688 |
+
| explained_variance | 0.399 β0.433 |
|
| 689 |
+
| fps | 3112 β31 |
|
| 690 |
+
| kl_div | 0.0016 β0.0018 |
|
| 691 |
+
| learning_rate | 0.000567 β0.000026 |
|
| 692 |
+
| obs_mean | 0.05 β0.02 |
|
| 693 |
+
| obs_std | 0.64 β0.0049 |
|
| 694 |
+
| reward_mean | 1.00 β0 |
|
| 695 |
+
| reward_std | 0.00 β0 |
|
| 696 |
+
| time_elapsed | 13.98 β0.69 |
|
| 697 |
+
| eval/ | |
|
| 698 |
+
| ep_rew_mean | 272.40 β0 |
|
| 699 |
+
| ep_len_mean | 272.40 β0 |
|
| 700 |
+
| epoch | 99 β0 |
|
| 701 |
+
| total_timesteps | 6040 β0 |
|
| 702 |
+
| total_episodes | 10 β0 |
|
| 703 |
+
| epoch_fps | 3127.00 β0 |
|
| 704 |
+
-----------------------------------------------
|
| 705 |
+
β οΈ ALGORITHM WARNING: High clip fraction (0.164) indicates policy is changing too rapidly. Consider reducing learning rate or clip_range.
|
| 706 |
+
β οΈ ALGORITHM WARNING: Very negative explained variance (-0.106) indicates value function is performing poorly. Check value function architecture or learning rate.
|
| 707 |
+
-----------------------------------------------
|
| 708 |
+
| train/ | |
|
| 709 |
+
| ep_rew_mean | 230.88 β21.00 |
|
| 710 |
+
| ep_len_mean | 230.00 β21.00 |
|
| 711 |
+
| epoch | 178 β10 |
|
| 712 |
+
| total_timesteps | 46080 β2560.00 |
|
| 713 |
+
| total_episodes | 456 β5 |
|
| 714 |
+
| total_rollouts | 180.00 β10.00 |
|
| 715 |
+
| rollout_timesteps | 256 β0 |
|
| 716 |
+
| rollout_episodes | 0.00 β1.00 |
|
| 717 |
+
| fps_instant | 4984 β397 |
|
| 718 |
+
| rollout_fps | 24308.83 β130.55 |
|
| 719 |
+
| loss | -0.01 β0.0013 |
|
| 720 |
+
| policy_loss | -0.0101 β0.0023 |
|
| 721 |
+
| value_loss | 0.0048 β0.0020 |
|
| 722 |
+
| entropy_loss | -0.4638 β0.0653 |
|
| 723 |
+
| action_mean | 0.51 β0.0007 |
|
| 724 |
+
| action_std | 0.50 β1.76e-05 |
|
| 725 |
+
| approx_kl | 0.0037 β0.0009 |
|
| 726 |
+
| baseline_mean | 0.00 β0 |
|
| 727 |
+
| baseline_std | 0.00 β0 |
|
| 728 |
+
| clip_fraction | 0.164 β0.034 |
|
| 729 |
+
| clip_range | 0.1084 β0.0051 |
|
| 730 |
+
| entropy | 0.4638 β0.0653 |
|
| 731 |
+
| explained_variance | -0.106 β0.505 |
|
| 732 |
+
| fps | 3154 β42 |
|
| 733 |
+
| kl_div | 0.0033 β0.0017 |
|
| 734 |
+
| learning_rate | 0.000542 β0.000026 |
|
| 735 |
+
| obs_mean | 0.04 β0.01 |
|
| 736 |
+
| obs_std | 0.62 β0.01 |
|
| 737 |
+
| reward_mean | 1.00 β0 |
|
| 738 |
+
| reward_std | 0.00 β0 |
|
| 739 |
+
| time_elapsed | 14.61 β0.63 |
|
| 740 |
+
| eval/ | |
|
| 741 |
+
| ep_rew_mean | 272.40 β0 |
|
| 742 |
+
| ep_len_mean | 272.40 β0 |
|
| 743 |
+
| epoch | 99 β0 |
|
| 744 |
+
| total_timesteps | 6040 β0 |
|
| 745 |
+
| total_episodes | 10 β0 |
|
| 746 |
+
| epoch_fps | 3127.00 β0 |
|
| 747 |
+
-----------------------------------------------
|
| 748 |
+
β οΈ ALGORITHM WARNING: High clip fraction (0.142) indicates policy is changing too rapidly. Consider reducing learning rate or clip_range.
|
| 749 |
+
-----------------------------------------------
|
| 750 |
+
| train/ | |
|
| 751 |
+
| ep_rew_mean | 252.27 β21.39 |
|
| 752 |
+
| ep_len_mean | 252.00 β22.00 |
|
| 753 |
+
| epoch | 188 β10 |
|
| 754 |
+
| total_timesteps | 48640 β2560.00 |
|
| 755 |
+
| total_episodes | 461 β5 |
|
| 756 |
+
| total_rollouts | 190.00 β10.00 |
|
| 757 |
+
| rollout_timesteps | 256 β0 |
|
| 758 |
+
| rollout_episodes | 0.00 β0 |
|
| 759 |
+
| fps_instant | 4553 β432 |
|
| 760 |
+
| rollout_fps | 23721.97 β586.86 |
|
| 761 |
+
| loss | -0.00 β0.0042 |
|
| 762 |
+
| policy_loss | -0.0045 β0.0056 |
|
| 763 |
+
| value_loss | 0.0021 β0.0027 |
|
| 764 |
+
| entropy_loss | -0.3854 β0.0783 |
|
| 765 |
+
| action_mean | 0.51 β0.0007 |
|
| 766 |
+
| action_std | 0.50 β1.58e-05 |
|
| 767 |
+
| approx_kl | 0.0022 β0.0015 |
|
| 768 |
+
| baseline_mean | 0.00 β0 |
|
| 769 |
+
| baseline_std | 0.00 β0 |
|
| 770 |
+
| clip_fraction | 0.142 β0.022 |
|
| 771 |
+
| clip_range | 0.1032 β0.0051 |
|
| 772 |
+
| entropy | 0.3854 β0.0783 |
|
| 773 |
+
| explained_variance | 0.883 β0.989 |
|
| 774 |
+
| fps | 3194 β40 |
|
| 775 |
+
| kl_div | -0.0016 β0.0049 |
|
| 776 |
+
| learning_rate | 0.000516 β0.000026 |
|
| 777 |
+
| obs_mean | 0.03 β0.01 |
|
| 778 |
+
| obs_std | 0.61 β0.01 |
|
| 779 |
+
| reward_mean | 1.00 β0 |
|
| 780 |
+
| reward_std | 0.00 β0 |
|
| 781 |
+
| time_elapsed | 15.23 β0.62 |
|
| 782 |
+
| eval/ | |
|
| 783 |
+
| ep_rew_mean | 272.40 β0 |
|
| 784 |
+
| ep_len_mean | 272.40 β0 |
|
| 785 |
+
| epoch | 99 β0 |
|
| 786 |
+
| total_timesteps | 6040 β0 |
|
| 787 |
+
| total_episodes | 10 β0 |
|
| 788 |
+
| epoch_fps | 3127.00 β0 |
|
| 789 |
+
-----------------------------------------------
|
| 790 |
+
β οΈ ALGORITHM WARNING: High clip range (0.0981) may lead to unstable training. Consider reducing.
|
| 791 |
+
-----------------------------------------------
|
| 792 |
+
| train/ | |
|
| 793 |
+
| ep_rew_mean | 271.25 β18.98 |
|
| 794 |
+
| ep_len_mean | 271.00 β19.00 |
|
| 795 |
+
| epoch | 198 β10 |
|
| 796 |
+
| total_timesteps | 51200 β2560.00 |
|
| 797 |
+
| total_episodes | 466 β5 |
|
| 798 |
+
| total_rollouts | 200.00 β10.00 |
|
| 799 |
+
| rollout_timesteps | 256 β0 |
|
| 800 |
+
| rollout_episodes | 0.00 β0 |
|
| 801 |
+
| fps_instant | 4227 β326 |
|
| 802 |
+
| rollout_fps | 23444.23 β277.74 |
|
| 803 |
+
| loss | 0.00 β0.0045 |
|
| 804 |
+
| policy_loss | -0.0024 β0.0022 |
|
| 805 |
+
| value_loss | 0.0068 β0.0047 |
|
| 806 |
+
| entropy_loss | -0.4011 β0.0157 |
|
| 807 |
+
| action_mean | 0.51 β0.0005 |
|
| 808 |
+
| action_std | 0.50 β1.22e-05 |
|
| 809 |
+
| approx_kl | 0.0010 β0.0012 |
|
| 810 |
+
| baseline_mean | 0.00 β0 |
|
| 811 |
+
| baseline_std | 0.00 β0 |
|
| 812 |
+
| clip_fraction | 0.059 β0.084 |
|
| 813 |
+
| clip_range | 0.0981 β0.0051 |
|
| 814 |
+
| entropy | 0.4011 β0.0157 |
|
| 815 |
+
| explained_variance | 0.706 β0.177 |
|
| 816 |
+
| fps | 3224 β30 |
|
| 817 |
+
| kl_div | -0.0002 β0.0014 |
|
| 818 |
+
| learning_rate | 0.000491 β0.000026 |
|
| 819 |
+
| obs_mean | 0.02 β0.01 |
|
| 820 |
+
| obs_std | 0.60 β0.01 |
|
| 821 |
+
| reward_mean | 1.00 β0 |
|
| 822 |
+
| reward_std | 0.00 β0 |
|
| 823 |
+
| time_elapsed | 15.88 β0.65 |
|
| 824 |
+
| eval/ | |
|
| 825 |
+
| ep_rew_mean | 272.40 β0 |
|
| 826 |
+
| ep_len_mean | 272.40 β0 |
|
| 827 |
+
| epoch | 99 β0 |
|
| 828 |
+
| total_timesteps | 6040 β0 |
|
| 829 |
+
| total_episodes | 10 β0 |
|
| 830 |
+
| epoch_fps | 3127.00 β0 |
|
| 831 |
+
-----------------------------------------------
|
| 832 |
+
β οΈ ALGORITHM WARNING: High clip range (0.0976) may lead to unstable training. Consider reducing.
|
| 833 |
+
-----------------------------------------------
|
| 834 |
+
| train/ | |
|
| 835 |
+
| ep_rew_mean | 271.25 β0 |
|
| 836 |
+
| ep_len_mean | 271.00 β0 |
|
| 837 |
+
| epoch | 198 β0 |
|
| 838 |
+
| total_timesteps | 51200 β0 |
|
| 839 |
+
| total_episodes | 466 β0 |
|
| 840 |
+
| total_rollouts | 200.00 β0 |
|
| 841 |
+
| rollout_timesteps | 256 β0 |
|
| 842 |
+
| rollout_episodes | 0.00 β0 |
|
| 843 |
+
| fps_instant | 4227 β0 |
|
| 844 |
+
| rollout_fps | 23444.23 β0 |
|
| 845 |
+
| loss | -0.00 β0.0030 |
|
| 846 |
+
| policy_loss | -0.0030 β0.0006 |
|
| 847 |
+
| value_loss | 0.0021 β0.0047 |
|
| 848 |
+
| entropy_loss | -0.4021 β0.0009 |
|
| 849 |
+
| action_mean | 0.51 β0 |
|
| 850 |
+
| action_std | 0.50 β0 |
|
| 851 |
+
| approx_kl | 0.0011 β0.0001 |
|
| 852 |
+
| baseline_mean | 0.00 β0 |
|
| 853 |
+
| baseline_std | 0.00 β0 |
|
| 854 |
+
| clip_fraction | 0.080 β0.021 |
|
| 855 |
+
| clip_range | 0.0976 β0.0005 |
|
| 856 |
+
| entropy | 0.4021 β0.0009 |
|
| 857 |
+
| explained_variance | 0.865 β0.160 |
|
| 858 |
+
| fps | 3224 β0 |
|
| 859 |
+
| kl_div | -0.0027 β0.0025 |
|
| 860 |
+
| learning_rate | 0.000488 β0.000003 |
|
| 861 |
+
| obs_mean | 0.02 β0 |
|
| 862 |
+
| obs_std | 0.60 β0 |
|
| 863 |
+
| reward_mean | 1.00 β0 |
|
| 864 |
+
| reward_std | 0.00 β0 |
|
| 865 |
+
| time_elapsed | 15.88 β0 |
|
| 866 |
+
| eval/ | |
|
| 867 |
+
| ep_rew_mean | 500.00 β227.60 |
|
| 868 |
+
| ep_len_mean | 500.00 β227.60 |
|
| 869 |
+
| epoch | 199 β100 |
|
| 870 |
+
| total_timesteps | 8000 β1960.00 |
|
| 871 |
+
| total_episodes | 10 β0 |
|
| 872 |
+
| epoch_fps | 4822.00 β1695.00 |
|
| 873 |
+
-----------------------------------------------
|
| 874 |
+
New best model saved with eval/ep_rew_mean=500.0000
|
| 875 |
+
Timestamped: runs/cvb5lyfw/checkpoints/epoch=199-step=4000.ckpt
|
| 876 |
+
Best: runs/cvb5lyfw/checkpoints/best_checkpoint.ckpt
|
| 877 |
+
Threshold reached! Saved model with eval/ep_rew_mean=500.0000 (threshold=475.0) at runs/cvb5lyfw/checkpoints/threshold-epoch=199-step=4000.ckpt
|
| 878 |
+
Early stopping at epoch 199 with eval mean reward 500.00 >= threshold 475.0
|
| 879 |
+
Using environment spec reward_threshold: 475.0
|
| 880 |
+
Best model saved at runs/cvb5lyfw/checkpoints/best_checkpoint.ckpt with eval reward 500.00
|
| 881 |
+
Loading checkpoint from runs/cvb5lyfw/checkpoints/best_checkpoint.ckpt
|
| 882 |
+
Checkpoint loaded:
|
| 883 |
+
Epoch: 199
|
| 884 |
+
Total timesteps: 0
|
| 885 |
+
Best eval reward: 272.3999938964844
|
| 886 |
+
Current eval reward: 500.0
|
| 887 |
+
Is best: True
|
| 888 |
+
Is threshold: False
|
| 889 |
+
Saved final evaluation video to: runs/cvb5lyfw/videos/eval/episodes/best_checkpoint.mp4
|
| 890 |
+
|
| 891 |
+
π Final hyperparameters:
|
| 892 |
+
Learning rate: 1.00e-03
|
| 893 |
+
Entropy coef: 0.000
|
| 894 |
+
Max grad norm: 0.500
|
| 895 |
+
Clip range: 0.200
|
| 896 |
+
Value function coef: 0.500
|
| 897 |
+
Training completed in 24.45 seconds (0.41 minutes)
|
artifacts/videos/eval/episodes/best_checkpoint.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a12461a39db591a152b968f8a0f976630bb100f6aae6540412a9ab19322a9621
|
| 3 |
+
size 152489
|