Papers
arxiv:2505.01857

DualDiff: Dual-branch Diffusion Model for Autonomous Driving with Semantic Fusion

Published on May 3
Authors:
,
,
,
,
,
,
,

Abstract

DualDiff, a dual-branch conditional diffusion model with Occupancy Ray Sampling and Semantic Fusion Attention, improves driving scene reconstruction and multi-modal information integration, achieving superior performance in FID, BEV segmentation, and 3D object detection.

AI-generated summary

Accurate and high-fidelity driving scene reconstruction relies on fully leveraging scene information as conditioning. However, existing approaches, which primarily use 3D bounding boxes and binary maps for foreground and background control, fall short in capturing the complexity of the scene and integrating multi-modal information. In this paper, we propose DualDiff, a dual-branch conditional diffusion model designed to enhance multi-view driving scene generation. We introduce Occupancy Ray Sampling (ORS), a semantic-rich 3D representation, alongside numerical driving scene representation, for comprehensive foreground and background control. To improve cross-modal information integration, we propose a Semantic Fusion Attention (SFA) mechanism that aligns and fuses features across modalities. Furthermore, we design a foreground-aware masked (FGM) loss to enhance the generation of tiny objects. DualDiff achieves state-of-the-art performance in FID score, as well as consistently better results in downstream BEV segmentation and 3D object detection tasks.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.01857 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.01857 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.01857 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.