Now that you’ve studied the theory behind Advantage Actor Critic (A2C), you’re ready to train your A2C agent using Stable-Baselines3 in a robotic environment. And train a:

A robotic arm 🦾 to move to the correct position. We’re going to use

panda-gym To validate this hands-on for the certification process, you need to push your two trained models to the Hub and get the following results:

PandaReachDense-v3 get a result of >= -3.5. To find your result, go to the leaderboard and find your model, the result = mean_reward - std of reward

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning