arxiv:2510.02722

MoGIC: Boosting Motion Generation via Intention Understanding and Visual Context

Published on Oct 3

Authors:

Abstract

MoGIC integrates intention modeling and visual priors for multimodal motion synthesis, improving precision and personalization through a mixture-of-attention mechanism and reducing FID scores on benchmarks.

AI-generated summary

Existing text-driven motion generation methods often treat synthesis as a bidirectional mapping between language and motion, but remain limited in capturing the causal logic of action execution and the human intentions that drive behavior. The absence of visual grounding further restricts precision and personalization, as language alone cannot specify fine-grained spatiotemporal details. We propose MoGIC, a unified framework that integrates intention modeling and visual priors into multimodal motion synthesis. By jointly optimizing multimodal-conditioned motion generation and intention prediction, MoGIC uncovers latent human goals, leverages visual priors to enhance generation, and exhibits versatile multimodal generative capability. We further introduce a mixture-of-attention mechanism with adaptive scope to enable effective local alignment between conditional tokens and motion subsequences. To support this paradigm, we curate Mo440H, a 440-hour benchmark from 21 high-quality motion datasets. Experiments show that after finetuning, MoGIC reduces FID by 38.6\% on HumanML3D and 34.6\% on Mo440H, surpasses LLM-based methods in motion captioning with a lightweight text head, and further enables intention prediction and vision-conditioned generation, advancing controllable motion synthesis and intention understanding. The code is available at https://github.com/JunyuShi02/MoGIC

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.02722 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.02722 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.02722 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.