Abstract
BANG is a generative approach using latent diffusion models and temporal attention to enable intuitive, part-level decomposition of 3D objects with precise control and multimodal interaction.
3D creation has always been a unique human strength, driven by our ability to deconstruct and reassemble objects using our eyes, mind and hand. However, current 3D design tools struggle to replicate this natural process, requiring considerable artistic expertise and manual labor. This paper introduces BANG, a novel generative approach that bridges 3D generation and reasoning, allowing for intuitive and flexible part-level decomposition of 3D objects. At the heart of BANG is "Generative Exploded Dynamics", which creates a smooth sequence of exploded states for an input geometry, progressively separating parts while preserving their geometric and semantic coherence. BANG utilizes a pre-trained large-scale latent diffusion model, fine-tuned for exploded dynamics with a lightweight exploded view adapter, allowing precise control over the decomposition process. It also incorporates a temporal attention module to ensure smooth transitions and consistency across time. BANG enhances control with spatial prompts, such as bounding boxes and surface regions, enabling users to specify which parts to decompose and how. This interaction can be extended with multimodal models like GPT-4, enabling 2D-to-3D manipulations for more intuitive and creative workflows. The capabilities of BANG extend to generating detailed part-level geometry, associating parts with functional descriptions, and facilitating component-aware 3D creation and manufacturing workflows. Additionally, BANG offers applications in 3D printing, where separable parts are generated for easy printing and reassembly. In essence, BANG enables seamless transformation from imaginative concepts to detailed 3D assets, offering a new perspective on creation that resonates with human intuition.
Community
A fictional journey from Earth's surface to the far reaches of space, celebrating humanity's boundless ingenuity and spirit of discovery. Each object is generated from a concept image and illustrated in four assembly states, using parts generated from our Generative Exploded Dynamics.
A steampunk workshop, where blueprints transform into tangible reality, powered by our BANG framework. Each asset begins as a concept image generated by FLUX, is then transformed into an integral 3D mesh via our base generative model, and subsequently exploded into parts and are meticulously enhanced part-wise for maximum visual fidelity. The generated exploded structures are displayed against the backdrop, showcasing the enhanced details achieved through our exploded-enhance pipeline.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion (2025)
- From One to More: Contextual Part Latents for 3D Generation (2025)
- DreamArt: Generating Interactable Articulated Objects from a Single Image (2025)
- PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers (2025)
- Assembler: Scalable 3D Part Assembly via Anchor Point Diffusion (2025)
- Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention (2025)
- AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper