File size: 3,773 Bytes
45dc25c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ac5841d
 
 
45dc25c
 
 
ac5841d
45dc25c
ac5841d
45dc25c
ac5841d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45dc25c
 
ac5841d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45dc25c
ac5841d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45dc25c
ac5841d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
library_name: stable-baselines3
tags:
- LunarLander-v2
- deep-reinforcement-learning
- reinforcement-learning
- stable-baselines3
model-index:
- name: PPO
  results:
  - task:
      type: reinforcement-learning
      name: reinforcement-learning
    dataset:
      name: LunarLander-v2
      type: LunarLander-v2
    metrics:
    - type: mean_reward
      value: 265.65 +/- 24.86
      name: mean_reward
      verified: false
license: mit
language:
- en
---


# PPO-LunarLander-v2

A Proximal Policy Optimization (PPO) agent trained to solve the LunarLander-v2 environment from Gymnasium.


## Model Details

### Description
This model is a Deep Reinforcement Learning agent using the PPO algorithm, trained to successfully land the lunar module in OpenAI/Gymnasium's LunarLander-v2 environment. The agent learns to control the lander's engines to achieve safe landing with optimal fuel usage.

- **Algorithm**: PPO (Proximal Policy Optimization)
- **Framework**: Stable Baselines3
- **Environment**: [LunarLander-v2](https://gymnasium.farama.org/environments/box2d/lunar_lander/)
- **Training Timesteps**: 1,000,000
- **Input**: 8-dimensional state vector (position, velocity, angles, leg contacts)
- **Output**: 4 discrete actions (do nothing, left engine, main engine, right engine)

## Intended Use
- Research in Deep Reinforcement Learning
- Benchmarking RL algorithms
- Educational purposes (Hugging Face Deep RL Course)
- Base model for transfer learning in similar environments

## Usage

### Installation
```python
!pip install stable-baselines3 gymnasium huggingface_sb3 shimmy
```

### Load and Run the Model
```python
from huggingface_sb3 import load_from_hub
from stable_baselines3 import PPO
import gymnasium as gym

# Download model
repo_id = "ashaduzzaman/ppo-LunarLander-v2"  # Replace with your repo
filename = "ppo-LunarLander-v2.zip"
checkpoint = load_from_hub(repo_id, filename)

# Load model with compatibility settings
custom_objects = {
    "learning_rate": 0.0,
    "lr_schedule": lambda _: 0.0,
    "clip_range": lambda _: 0.0,
}
model = PPO.load(checkpoint, custom_objects=custom_objects)

# Evaluate
from stable_baselines3.common.evaluation import evaluate_policy
eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10)
print(f"Mean reward: {mean_reward:.2f} ± {std_reward:.2f}")
```

## Training

### Hyperparameters
```python
PPO(
    policy="MlpPolicy",
    n_steps=1024,
    batch_size=64,
    n_epochs=4,
    gamma=0.999,
    gae_lambda=0.98,
    ent_coef=0.01,
    learning_rate=0.00025,
    verbose=1
)
```

### Training Configuration
- **Total Timesteps**: 1,000,000
- **Parallel Environments**: 16
- **Optimizer**: Adam
- **Policy Network**: 2 hidden layers (64 units each)
- **Activation**: Tanh
- **Training Hardware**: NVIDIA Tesla T4 GPU

## Evaluation

| Metric          | Value  |
|-----------------|--------|
| Mean Reward     | 257.67 |
| Std Reward      | 24.70  |
| Success Rate    | 100%   |
| Avg Episode Length | 270 steps |

## Environmental Impact

**Carbon Emissions Estimate**  
Training done on Google Colab:  
- **Hardware Type**: NVIDIA T4 GPU
- **Hours Used**: 0.5
- **Cloud Provider**: Google Cloud
- **Compute Region**: us-west1
- **Carbon Emitted**: ~0.03 kgCO₂eq

## Credits

- Developed as part of [Hugging Face Deep RL Course](https://huggingface.co/deep-rl-course)
- Base implementation using [Stable Baselines3](https://stable-baselines3.readthedocs.io/)
- Environment by [Gymnasium](https://gymnasium.farama.org/)

## License
MIT License - Free for academic and commercial use. See [LICENSE](https://opensource.org/licenses/MIT) for details.

---

**Leaderboard Submission**  
`result = mean_reward - std_reward = 257.67 - 24.70 = 232.97`