enrique2701 commited on
Commit
0b5fb11
·
verified ·
1 Parent(s): 8c81433

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -7
README.md CHANGED
@@ -11,14 +11,42 @@ tags:
11
  This is a trained model of a **ppo** agent playing **SnowballTarget**
12
  using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents).
13
 
14
- ## Usage (with ML-Agents)
15
- The Documentation: https://unity-technologies.github.io/ml-agents/ML-Agents-Toolkit-Documentation/
 
 
 
 
16
 
17
- We wrote a complete tutorial to learn to train your first agent using ML-Agents and publish it to the Hub:
18
- - A *short tutorial* where you teach Huggy the Dog 🐶 to fetch the stick and then play with him directly in your
19
- browser: https://huggingface.co/learn/deep-rl-course/unitbonus1/introduction
20
- - A *longer tutorial* to understand how works ML-Agents:
21
- https://huggingface.co/learn/deep-rl-course/unit5/introduction
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ### Resume the training
24
  ```bash
 
11
  This is a trained model of a **ppo** agent playing **SnowballTarget**
12
  using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents).
13
 
14
+ ## Results
15
+ [INFO] SnowballTarget.
16
+ Step: 400000.
17
+ Time Elapsed: 903.639 s.
18
+ Mean Reward: 25.591.
19
+ Std of Reward: 1.992.
20
 
21
+ ## Hyperparameters
22
+ %%file /content/ml-agents/config/ppo/SnowballTarget.yaml
23
+ behaviors:
24
+ SnowballTarget:
25
+ trainer_type: ppo
26
+ summary_freq: 10000
27
+ keep_checkpoints: 10
28
+ checkpoint_interval: 50000
29
+ max_steps: 400000
30
+ time_horizon: 32
31
+ threaded: true
32
+ hyperparameters:
33
+ learning_rate: 0.0003
34
+ learning_rate_schedule: linear
35
+ batch_size: 128
36
+ buffer_size: 2048
37
+ beta: 0.005
38
+ epsilon: 0.2
39
+ lambd: 0.95
40
+ num_epoch: 3
41
+ network_settings:
42
+ normalize: false
43
+ hidden_units: 256
44
+ num_layers: 3
45
+ vis_encode_type: nature_cnn
46
+ reward_signals:
47
+ extrinsic:
48
+ gamma: 0.9
49
+ strength: 1.0
50
 
51
  ### Resume the training
52
  ```bash