Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization
Abstract
This paper introduces Comprehensive Relighting, the first all-in-one approach that can both control and harmonize the lighting from an image or video of humans with arbitrary body parts from any scene. Building such a generalizable model is extremely challenging due to the lack of dataset, restricting existing image-based relighting models to a specific scenario (e.g., face or static human). To address this challenge, we repurpose a pre-trained diffusion model as a general image prior and jointly model the human relighting and background harmonization in the coarse-to-fine framework. To further enhance the temporal coherence of the relighting, we introduce an unsupervised temporal lighting model that learns the lighting cycle consistency from many real-world videos without any ground truth. In inference time, our temporal lighting module is combined with the diffusion models through the spatio-temporal feature blending algorithms without extra training; and we apply a new guided refinement as a post-processing to preserve the high-frequency details from the input image. In the experiments, Comprehensive Relighting shows a strong generalizability and lighting temporal coherence, outperforming existing image-based human relighting and harmonization methods.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset (2025)
- High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model (2025)
- Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency (2025)
- LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds (2025)
- HumanGif: Single-View Human Diffusion with Generative Prior (2025)
- VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation (2025)
- CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper