Pippo: High-Resolution Multi-View Humans from a Single Image
Abstract
We present Pippo, a generative model capable of producing 1K resolution dense turnaround videos of a person from a single casually clicked photo. Pippo is a multi-view diffusion transformer and does not require any additional inputs - e.g., a fitted parametric model or camera parameters of the input image. We pre-train Pippo on 3B human images without captions, and conduct multi-view mid-training and post-training on studio captured humans. During mid-training, to quickly absorb the studio dataset, we denoise several (up to 48) views at low-resolution, and encode target cameras coarsely using a shallow MLP. During post-training, we denoise fewer views at high-resolution and use pixel-aligned controls (e.g., Spatial anchor and Plucker rays) to enable 3D consistent generations. At inference, we propose an attention biasing technique that allows Pippo to simultaneously generate greater than 5 times as many views as seen during training. Finally, we also introduce an improved metric to evaluate 3D consistency of multi-view generations, and show that Pippo outperforms existing works on multi-view human generation from a single image.
Community
Pippo generates 1K resolution, multi-view, studio-quality images from a single photo in a one forward pass. It takes as input a full-body or face-only photo, can blend the input with novel generated content very well!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- FaceLift: Single Image to 3D Head with View Generation and GS-LRM (2024)
- 3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement (2024)
- MEt3R: Measuring Multi-View Consistency in Generated Images (2025)
- IDOL: Instant Photorealistic 3D Human Creation from a Single Image (2024)
- Fillerbuster: Multi-View Scene Completion for Casual Captures (2025)
- HuGDiffusion: Generalizable Single-Image Human Rendering via 3D Gaussian Diffusion (2025)
- Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper