arxiv:2509.22281

MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning

Published on Sep 26

· Submitted by

Lixing Xiao on Sep 29

Upvote

Authors:

Jinkun Hao ,

Naifu Liang ,

Abstract

MesaTask, an LLM-based framework with a Spatial Reasoning Chain, generates realistic tabletop scenes aligned with task descriptions using DPO algorithms.

AI-generated summary

The ability of robots to interpret human instructions and execute manipulation tasks necessitates the availability of task-relevant tabletop scenes for training. However, traditional methods for creating these scenes rely on time-consuming manual layout design or purely randomized layouts, which are limited in terms of plausibility or alignment with the tasks. In this paper, we formulate a novel task, namely task-oriented tabletop scene generation, which poses significant challenges due to the substantial gap between high-level task instructions and the tabletop scenes. To support research on such a challenging task, we introduce MesaTask-10K, a large-scale dataset comprising approximately 10,700 synthetic tabletop scenes with manually crafted layouts that ensure realistic layouts and intricate inter-object relations. To bridge the gap between tasks and scenes, we propose a Spatial Reasoning Chain that decomposes the generation process into object inference, spatial interrelation reasoning, and scene graph construction for the final 3D layout. We present MesaTask, an LLM-based framework that utilizes this reasoning chain and is further enhanced with DPO algorithms to generate physically plausible tabletop scenes that align well with given task descriptions. Exhaustive experiments demonstrate the superior performance of MesaTask compared to baselines in generating task-conforming tabletop scenes with realistic layouts. Project page is at https://mesatask.github.io/

View arXiv page View PDF Project page GitHub 30 Add to collection

Community

lxxiao

Paper submitter 3 days ago

🤖 Robots that actually understand “put the fruit in the bowl”?
Meet MesaTask 🚀 [NeurIPS 2025 Spotlight]
✨ 10K+ physics-verified tabletop scenes
✨ 12K+ curated 3D assets
✨ Outperforms baselines in alignment, realism & physicality
All data & code are OPEN — try it now ⚡