enrique2701
/

ppo-SnowballTarget

Reinforcement Learning

deep-reinforcement-learning

ML-Agents-SnowballTarget

Model card Files Files and versions

Metrics Training metrics Community

enrique2701 commited on Feb 25, 2024

Commit

0b5fb11

·

verified ·

1 Parent(s): 8c81433

Update README.md

Files changed (1) hide show

README.md +35 -7

README.md CHANGED Viewed

@@ -11,14 +11,42 @@ tags:
   This is a trained model of a **ppo** agent playing **SnowballTarget**
   using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents).
-  ## Usage (with ML-Agents)
-  The Documentation: https://unity-technologies.github.io/ml-agents/ML-Agents-Toolkit-Documentation/
-  We wrote a complete tutorial to learn to train your first agent using ML-Agents and publish it to the Hub:
-  - A *short tutorial* where you teach Huggy the Dog 🐶 to fetch the stick and then play with him directly in your
-  browser: https://huggingface.co/learn/deep-rl-course/unitbonus1/introduction
-  - A *longer tutorial* to understand how works ML-Agents:
-  https://huggingface.co/learn/deep-rl-course/unit5/introduction
   ### Resume the training
   ```bash

   This is a trained model of a **ppo** agent playing **SnowballTarget**
   using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents).
+  ## Results
+  [INFO] SnowballTarget.
+  Step: 400000.
+  Time Elapsed: 903.639 s.
+  Mean Reward: 25.591.
+  Std of Reward: 1.992.
+  ## Hyperparameters
+%%file /content/ml-agents/config/ppo/SnowballTarget.yaml
+behaviors:
+  SnowballTarget:
+    trainer_type: ppo
+    summary_freq: 10000
+    keep_checkpoints: 10
+    checkpoint_interval: 50000
+    max_steps: 400000
+    time_horizon: 32
+    threaded: true
+    hyperparameters:
+      learning_rate: 0.0003
+      learning_rate_schedule: linear
+      batch_size: 128
+      buffer_size: 2048
+      beta: 0.005
+      epsilon: 0.2
+      lambd: 0.95
+      num_epoch: 3
+    network_settings:
+      normalize: false
+      hidden_units: 256
+      num_layers: 3
+      vis_encode_type: nature_cnn
+    reward_signals:
+      extrinsic:
+        gamma: 0.9
+        strength: 1.0
   ### Resume the training
   ```bash