G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration
Abstract
G-CUT3R enhances 3D scene reconstruction by integrating auxiliary data through dedicated encoders and zero convolution, improving performance across benchmarks.
We introduce G-CUT3R, a novel feed-forward approach for guided 3D scene reconstruction that enhances the CUT3R model by integrating prior information. Unlike existing feed-forward methods that rely solely on input images, our method leverages auxiliary data, such as depth, camera calibrations, or camera positions, commonly available in real-world scenarios. We propose a lightweight modification to CUT3R, incorporating a dedicated encoder for each modality to extract features, which are fused with RGB image tokens via zero convolution. This flexible design enables seamless integration of any combination of prior information during inference. Evaluated across multiple benchmarks, including 3D reconstruction and other multi-view tasks, our approach demonstrates significant performance improvements, showing its ability to effectively utilize available priors while maintaining compatibility with varying input modalities.
Community
G-CUT3R is a fast, feed-forward 3D reconstruction model that fuses RGB with real-world priors (calibrations, poses, depth), enabling flexible, efficient, and state-of-the-art results while requiring less training data.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images (2025)
- No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views (2025)
- IDCNet: Guided Video Diffusion for Metric-Consistent RGBD Scene Generation with Precise Camera Control (2025)
- Surf3R: Rapid Surface Reconstruction from Sparse RGB Views in Seconds (2025)
- STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer (2025)
- DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion (2025)
- Review of Feed-forward 3D Reconstruction: From DUSt3R to VGGT (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper