Papers
arxiv:2507.09144

I^{2}-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting

Published on Jul 12
Authors:
,
,
,
,
,

Abstract

$I^{2}$-World, an encoder-decoder framework, efficiently tokenizes and forecasts 4D occupancy in 3D scenes, achieving state-of-the-art performance with high computational efficiency.

AI-generated summary

Forecasting the evolution of 3D scenes and generating unseen scenarios via occupancy-based world models offers substantial potential for addressing corner cases in autonomous driving systems. While tokenization has revolutionized image and video generation, efficiently tokenizing complex 3D scenes remains a critical challenge for 3D world models. To address this, we propose I^{2}-World, an efficient framework for 4D occupancy forecasting. Our method decouples scene tokenization into intra-scene and inter-scene tokenizers. The intra-scene tokenizer employs a multi-scale residual quantization strategy to hierarchically compress 3D scenes while preserving spatial details. The inter-scene tokenizer residually aggregates temporal dependencies across timesteps. This dual design preserves the compactness of 3D tokenizers while retaining the dynamic expressiveness of 4D tokenizers. Unlike decoder-only GPT-style autoregressive models, I^{2}-World adopts an encoder-decoder architecture. The encoder aggregates spatial context from the current scene and predicts a transformation matrix to enable high-level control over scene generation. The decoder, conditioned on this matrix and historical tokens, ensures temporal consistency during generation. Experiments demonstrate that I^{2}-World achieves state-of-the-art performance, outperforming existing methods by 25.1\% in mIoU and 36.9\% in IoU for 4D occupancy forecasting while exhibiting exceptional computational efficiency: it requires merely 2.9 GB of training memory and achieves real-time inference at 37.0 FPS. Our code is available on https://github.com/lzzzzzm/II-World.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.09144 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.09144 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.09144 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.