arxiv:2508.11379

G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration

Published on Aug 15

· Submitted by

rusrakhimov on Aug 19

Upvote

Authors:

Ramil Khafizov ,

Ruslan Rakhimov ,

Abstract

G-CUT3R enhances 3D scene reconstruction by integrating auxiliary data through dedicated encoders and zero convolution, improving performance across benchmarks.

AI-generated summary

We introduce G-CUT3R, a novel feed-forward approach for guided 3D scene reconstruction that enhances the CUT3R model by integrating prior information. Unlike existing feed-forward methods that rely solely on input images, our method leverages auxiliary data, such as depth, camera calibrations, or camera positions, commonly available in real-world scenarios. We propose a lightweight modification to CUT3R, incorporating a dedicated encoder for each modality to extract features, which are fused with RGB image tokens via zero convolution. This flexible design enables seamless integration of any combination of prior information during inference. Evaluated across multiple benchmarks, including 3D reconstruction and other multi-view tasks, our approach demonstrates significant performance improvements, showing its ability to effectively utilize available priors while maintaining compatibility with varying input modalities.

View arXiv page View PDF Add to collection

Community

rusrakhimov

Paper author Paper submitter 2 days ago

G-CUT3R is a fast, feed-forward 3D reconstruction model that fuses RGB with real-world priors (calibrations, poses, depth), enabling flexible, efficient, and state-of-the-art results while requiring less training data.