RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
Abstract
Existing end-to-end autonomous driving (AD) algorithms typically follow the Imitation Learning (IL) paradigm, which faces challenges such as causal confusion and the open-loop gap. In this work, we establish a 3DGS-based closed-loop Reinforcement Learning (RL) training paradigm. By leveraging 3DGS techniques, we construct a photorealistic digital replica of the real physical world, enabling the AD policy to extensively explore the state space and learn to handle out-of-distribution scenarios through large-scale trial and error. To enhance safety, we design specialized rewards that guide the policy to effectively respond to safety-critical events and understand real-world causal relationships. For better alignment with human driving behavior, IL is incorporated into RL training as a regularization term. We introduce a closed-loop evaluation benchmark consisting of diverse, previously unseen 3DGS environments. Compared to IL-based methods, RAD achieves stronger performance in most closed-loop metrics, especially 3x lower collision rate. Abundant closed-loop results are presented at https://hgao-cv.github.io/RAD.
Community
Project page: https://hgao-cv.github.io/RAD/
We propose a reinforcement learning-based post-training paradigm for end-to-end autonomous driving, called RAD. RAD leverages 3DGS technology to construct a highly realistic digital twin of the physical world, enabling the end-to-end model to control the vehicle like a human driver. RAD progressively improves its driving skills through continuous interaction with the environment, receiving feedback, and refining its policy via extensive exploration and trial-and-error.
After reinforcement learning training in a large-scale 3DGS environment, RAD learns a more effective driving strategy. In the same closed-loop evaluation benchmark, RAD reduces the collision rate by a factor of three compared to a policy trained solely with imitation learning (IL-only). Additionally, we provide a series of visual experiments to illustrate the key differences between RAD and the IL-only policy.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning (2025)
- Diffusion-Based Planning for Autonomous Driving with Flexible Guidance (2025)
- A Survey of World Models for Autonomous Driving (2025)
- DIAL: Distribution-Informed Adaptive Learning of Multi-Task Constraints for Safety-Critical Systems (2025)
- Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network (2025)
- ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy (2025)
- CuRLA: Curriculum Learning Based Deep Reinforcement Learning for Autonomous Driving (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper