Papers
arxiv:2407.13146

PG-Rainbow: Using Distributional Reinforcement Learning in Policy Gradient Methods

Published on Jul 18, 2024
Authors:
,

Abstract

This paper introduces PG-Rainbow, a novel algorithm that incorporates a distributional reinforcement learning framework with a policy gradient algorithm. Existing policy gradient methods are sample inefficient and rely on the mean of returns when calculating the state-action value function, neglecting the distributional nature of returns in reinforcement learning tasks. To address this issue, we use an Implicit Quantile Network that provides the quantile information of the distribution of rewards to the critic network of the Proximal Policy Optimization algorithm. We show empirical results that through the integration of reward distribution information into the policy network, the policy agent acquires enhanced capabilities to comprehensively evaluate the consequences of potential actions in a given state, facilitating more sophisticated and informed decision-making processes. We evaluate the performance of the proposed algorithm in the Atari-2600 game suite, simulated via the Arcade Learning Environment (ALE).

Community

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2407.13146 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.13146 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2407.13146 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.