arxiv:2507.06230

Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion

Published on Jul 8

· Submitted by

ChristophReich1996 on Jul 9

Upvote

Authors:

Aleksandar Jevtić ,

Christoph Reich ,

Oliver Hahn ,

Abstract

SceneDINO achieves state-of-the-art segmentation accuracy in unsupervised semantic scene completion by leveraging self-supervised representation learning and 2D unsupervised scene understanding techniques.

AI-generated summary

Semantic scene completion (SSC) aims to infer both the 3D geometry and semantics of a scene from single images. In contrast to prior work on SSC that heavily relies on expensive ground-truth annotations, we approach SSC in an unsupervised setting. Our novel method, SceneDINO, adapts techniques from self-supervised representation learning and 2D unsupervised scene understanding to SSC. Our training exclusively utilizes multi-view consistency self-supervision without any form of semantic or geometric ground truth. Given a single input image, SceneDINO infers the 3D geometry and expressive 3D DINO features in a feed-forward manner. Through a novel 3D feature distillation approach, we obtain unsupervised 3D semantics. In both 3D and 2D unsupervised scene understanding, SceneDINO reaches state-of-the-art segmentation accuracy. Linear probing our 3D features matches the segmentation accuracy of a current supervised SSC approach. Additionally, we showcase the domain generalization and multi-view consistency of SceneDINO, taking the first steps towards a strong foundation for single image 3D scene understanding.

View arXiv page View PDF Project page GitHub 15 Add to collection

Community

ChristophReich1996

Paper author Paper submitter about 15 hours ago

SceneDINO is unsupervised and infers 3D geometry and 3D features from a single image in a feed-forward manner, using multi-view self-supervised training. Distilling and clustering features lead to unsupervised semantic scene completion predictions.

ChristophReich1996

Paper author Paper submitter about 15 hours ago

Check out our demo https://huggingface.co/spaces/jev-aleks/SceneDINO.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.06230 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.