File size: 3,773 Bytes
45dc25c ac5841d 45dc25c ac5841d 45dc25c ac5841d 45dc25c ac5841d 45dc25c ac5841d 45dc25c ac5841d 45dc25c ac5841d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
---
library_name: stable-baselines3
tags:
- LunarLander-v2
- deep-reinforcement-learning
- reinforcement-learning
- stable-baselines3
model-index:
- name: PPO
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: LunarLander-v2
type: LunarLander-v2
metrics:
- type: mean_reward
value: 265.65 +/- 24.86
name: mean_reward
verified: false
license: mit
language:
- en
---
# PPO-LunarLander-v2
A Proximal Policy Optimization (PPO) agent trained to solve the LunarLander-v2 environment from Gymnasium.
## Model Details
### Description
This model is a Deep Reinforcement Learning agent using the PPO algorithm, trained to successfully land the lunar module in OpenAI/Gymnasium's LunarLander-v2 environment. The agent learns to control the lander's engines to achieve safe landing with optimal fuel usage.
- **Algorithm**: PPO (Proximal Policy Optimization)
- **Framework**: Stable Baselines3
- **Environment**: [LunarLander-v2](https://gymnasium.farama.org/environments/box2d/lunar_lander/)
- **Training Timesteps**: 1,000,000
- **Input**: 8-dimensional state vector (position, velocity, angles, leg contacts)
- **Output**: 4 discrete actions (do nothing, left engine, main engine, right engine)
## Intended Use
- Research in Deep Reinforcement Learning
- Benchmarking RL algorithms
- Educational purposes (Hugging Face Deep RL Course)
- Base model for transfer learning in similar environments
## Usage
### Installation
```python
!pip install stable-baselines3 gymnasium huggingface_sb3 shimmy
```
### Load and Run the Model
```python
from huggingface_sb3 import load_from_hub
from stable_baselines3 import PPO
import gymnasium as gym
# Download model
repo_id = "ashaduzzaman/ppo-LunarLander-v2" # Replace with your repo
filename = "ppo-LunarLander-v2.zip"
checkpoint = load_from_hub(repo_id, filename)
# Load model with compatibility settings
custom_objects = {
"learning_rate": 0.0,
"lr_schedule": lambda _: 0.0,
"clip_range": lambda _: 0.0,
}
model = PPO.load(checkpoint, custom_objects=custom_objects)
# Evaluate
from stable_baselines3.common.evaluation import evaluate_policy
eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10)
print(f"Mean reward: {mean_reward:.2f} ± {std_reward:.2f}")
```
## Training
### Hyperparameters
```python
PPO(
policy="MlpPolicy",
n_steps=1024,
batch_size=64,
n_epochs=4,
gamma=0.999,
gae_lambda=0.98,
ent_coef=0.01,
learning_rate=0.00025,
verbose=1
)
```
### Training Configuration
- **Total Timesteps**: 1,000,000
- **Parallel Environments**: 16
- **Optimizer**: Adam
- **Policy Network**: 2 hidden layers (64 units each)
- **Activation**: Tanh
- **Training Hardware**: NVIDIA Tesla T4 GPU
## Evaluation
| Metric | Value |
|-----------------|--------|
| Mean Reward | 257.67 |
| Std Reward | 24.70 |
| Success Rate | 100% |
| Avg Episode Length | 270 steps |
## Environmental Impact
**Carbon Emissions Estimate**
Training done on Google Colab:
- **Hardware Type**: NVIDIA T4 GPU
- **Hours Used**: 0.5
- **Cloud Provider**: Google Cloud
- **Compute Region**: us-west1
- **Carbon Emitted**: ~0.03 kgCO₂eq
## Credits
- Developed as part of [Hugging Face Deep RL Course](https://huggingface.co/deep-rl-course)
- Base implementation using [Stable Baselines3](https://stable-baselines3.readthedocs.io/)
- Environment by [Gymnasium](https://gymnasium.farama.org/)
## License
MIT License - Free for academic and commercial use. See [LICENSE](https://opensource.org/licenses/MIT) for details.
---
**Leaderboard Submission**
`result = mean_reward - std_reward = 257.67 - 24.70 = 232.97` |