Papers
arxiv:2509.22281

MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning

Published on Sep 26
· Submitted by Lixing Xiao on Sep 29
Authors:
,
,
,
,
,
,
,
,

Abstract

MesaTask, an LLM-based framework with a Spatial Reasoning Chain, generates realistic tabletop scenes aligned with task descriptions using DPO algorithms.

AI-generated summary

The ability of robots to interpret human instructions and execute manipulation tasks necessitates the availability of task-relevant tabletop scenes for training. However, traditional methods for creating these scenes rely on time-consuming manual layout design or purely randomized layouts, which are limited in terms of plausibility or alignment with the tasks. In this paper, we formulate a novel task, namely task-oriented tabletop scene generation, which poses significant challenges due to the substantial gap between high-level task instructions and the tabletop scenes. To support research on such a challenging task, we introduce MesaTask-10K, a large-scale dataset comprising approximately 10,700 synthetic tabletop scenes with manually crafted layouts that ensure realistic layouts and intricate inter-object relations. To bridge the gap between tasks and scenes, we propose a Spatial Reasoning Chain that decomposes the generation process into object inference, spatial interrelation reasoning, and scene graph construction for the final 3D layout. We present MesaTask, an LLM-based framework that utilizes this reasoning chain and is further enhanced with DPO algorithms to generate physically plausible tabletop scenes that align well with given task descriptions. Exhaustive experiments demonstrate the superior performance of MesaTask compared to baselines in generating task-conforming tabletop scenes with realistic layouts. Project page is at https://mesatask.github.io/

Community

Paper submitter

🤖 Robots that actually understand “put the fruit in the bowl”?
Meet MesaTask 🚀 [NeurIPS 2025 Spotlight]
✨ 10K+ physics-verified tabletop scenes
✨ 12K+ curated 3D assets
✨ Outperforms baselines in alignment, realism & physicality
All data & code are OPEN — try it now ⚡

Paper author

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.22281 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.22281 in a Space README.md to link it from this page.

Collections including this paper 1