---
library_name: stable-baselines3
tags:
- LunarLander-v2
- deep-reinforcement-learning
- reinforcement-learning
- stable-baselines3
model-index:
- name: PPO
  results:
  - task:
      type: reinforcement-learning
      name: reinforcement-learning
    dataset:
      name: LunarLander-v2
      type: LunarLander-v2
    metrics:
    - type: mean_reward
      value: 265.65 +/- 24.86
      name: mean_reward
      verified: false
license: mit
language:
- en
---


# PPO-LunarLander-v2

A Proximal Policy Optimization (PPO) agent trained to solve the LunarLander-v2 environment from Gymnasium.


## Model Details

### Description
This model is a Deep Reinforcement Learning agent using the PPO algorithm, trained to successfully land the lunar module in OpenAI/Gymnasium's LunarLander-v2 environment. The agent learns to control the lander's engines to achieve safe landing with optimal fuel usage.

- **Algorithm**: PPO (Proximal Policy Optimization)
- **Framework**: Stable Baselines3
- **Environment**: [LunarLander-v2](https://gymnasium.farama.org/environments/box2d/lunar_lander/)
- **Training Timesteps**: 1,000,000
- **Input**: 8-dimensional state vector (position, velocity, angles, leg contacts)
- **Output**: 4 discrete actions (do nothing, left engine, main engine, right engine)

## Intended Use
- Research in Deep Reinforcement Learning
- Benchmarking RL algorithms
- Educational purposes (Hugging Face Deep RL Course)
- Base model for transfer learning in similar environments

## Usage

### Installation
```python
!pip install stable-baselines3 gymnasium huggingface_sb3 shimmy
```

### Load and Run the Model
```python
from huggingface_sb3 import load_from_hub
from stable_baselines3 import PPO
import gymnasium as gym

# Download model
repo_id = "ashaduzzaman/ppo-LunarLander-v2"  # Replace with your repo
filename = "ppo-LunarLander-v2.zip"
checkpoint = load_from_hub(repo_id, filename)

# Load model with compatibility settings
custom_objects = {
    "learning_rate": 0.0,
    "lr_schedule": lambda _: 0.0,
    "clip_range": lambda _: 0.0,
}
model = PPO.load(checkpoint, custom_objects=custom_objects)

# Evaluate
from stable_baselines3.common.evaluation import evaluate_policy
eval_env = gym.make("LunarLander-v2")
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10)
print(f"Mean reward: {mean_reward:.2f} ± {std_reward:.2f}")
```

## Training

### Hyperparameters
```python
PPO(
    policy="MlpPolicy",
    n_steps=1024,
    batch_size=64,
    n_epochs=4,
    gamma=0.999,
    gae_lambda=0.98,
    ent_coef=0.01,
    learning_rate=0.00025,
    verbose=1
)
```

### Training Configuration
- **Total Timesteps**: 1,000,000
- **Parallel Environments**: 16
- **Optimizer**: Adam
- **Policy Network**: 2 hidden layers (64 units each)
- **Activation**: Tanh
- **Training Hardware**: NVIDIA Tesla T4 GPU

## Evaluation

| Metric          | Value  |
|-----------------|--------|
| Mean Reward     | 257.67 |
| Std Reward      | 24.70  |
| Success Rate    | 100%   |
| Avg Episode Length | 270 steps |

## Environmental Impact

**Carbon Emissions Estimate**  
Training done on Google Colab:  
- **Hardware Type**: NVIDIA T4 GPU
- **Hours Used**: 0.5
- **Cloud Provider**: Google Cloud
- **Compute Region**: us-west1
- **Carbon Emitted**: ~0.03 kgCO₂eq

## Credits

- Developed as part of [Hugging Face Deep RL Course](https://huggingface.co/deep-rl-course)
- Base implementation using [Stable Baselines3](https://stable-baselines3.readthedocs.io/)
- Environment by [Gymnasium](https://gymnasium.farama.org/)

## License
MIT License - Free for academic and commercial use. See [LICENSE](https://opensource.org/licenses/MIT) for details.

---

**Leaderboard Submission**  
`result = mean_reward - std_reward = 257.67 - 24.70 = 232.97`