File size: 4,285 Bytes
e8e536b
afa27c8
e8e536b
 
19fb693
2a534b4
19fb693
2a534b4
19fb693
2a534b4
19fb693
 
 
 
2a534b4
 
19fb693
2a534b4
142391d
19fb693
2a534b4
 
 
a0788d8
2a534b4
a0788d8
2a534b4
df2d336
 
 
 
 
 
 
 
a0788d8
 
afa27c8
 
2a534b4
afa27c8
a0788d8
afa27c8
 
 
 
 
 
 
 
 
 
 
ac631ae
afa27c8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19fb693
ac631ae
 
 
2a534b4
142391d
2a534b4
a0788d8
2a534b4
a0788d8
 
2a534b4
a0788d8
2a534b4
a0788d8
 
2a534b4
 
 
19fb693
2a534b4
26b85d3
2a534b4
19fb693
2a534b4
19fb693
2a534b4
 
19fb693
 
 
 
 
 
 
 
2a534b4
19fb693
2a534b4
19fb693
e8e536b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
license: cc-by-nc-2.0
pipeline_tag: image-to-3d
---
# [ECCV 2024] VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models

[Porject page](https://junlinhan.github.io/projects/vfusion3d.html), [Paper link](https://arxiv.org/abs/2403.12034)

VFusion3D is a large, feed-forward 3D generative model trained with a small amount of 3D data and a large volume of synthetic multi-view data. It is the first work exploring scalable 3D generative/reconstruction models as a step towards a 3D foundation.

[VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models](https://junlinhan.github.io/projects/vfusion3d.html)<br>
[Junlin Han](https://junlinhan.github.io/), [Filippos Kokkinos](https://www.fkokkinos.com/), [Philip Torr](https://www.robots.ox.ac.uk/~phst/)<br>
GenAI, Meta and TVG, University of Oxford<br>
European Conference on Computer Vision (ECCV), 2024


## News

- [08.08.2024] [HF Demo](https://huggingface.co/spaces/facebook/VFusion3D) is available, big thanks to [Jade Choghari](https://github.com/jadechoghari)'s help for making it possible. 
- [25.07.2024] Release weights and inference code for VFusion3D.



## Quick Start

Getting started with VFusion3D is super easy! 🤗 Here’s how you can use the model with Hugging Face:

### Install Dependencies (Optional)

Depending on your needs, you may want to enable specific features like mesh generation or video rendering. We've got you covered with these additional packages:

```bash
!pip --quiet install imageio[ffmpeg] PyMCubes trimesh rembg[gpu,cli] kiui
```

### Load model directly
```python
import torch
from transformers import AutoModel, AutoProcessor

# load the model and processor
model = AutoModel.from_pretrained("jadechoghari/vfusion3d", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("jadechoghari/vfusion3d")

# download and preprocess the image
import requests
from PIL import Image
from io import BytesIO

image_url = 'https://sm.ign.com/ign_nordic/cover/a/avatar-gen/avatar-generations_prsz.jpg'
response = requests.get(image_url)
image = Image.open(BytesIO(response.content))

# preprocess the image and get the source camera 
image, source_camera = processor(image)


# generate planes (default output)
output_planes = model(image, source_camera)
print("Planes shape:", output_planes.shape)

# generate a 3D mesh
output_planes, mesh_path = model(image, source_camera, export_mesh=True)
print("Planes shape:", output_planes.shape)
print("Mesh saved at:", mesh_path)

# Generate a video
output_planes, video_path = model(image, source_camera, export_video=True)
print("Planes shape:", output_planes.shape)
print("Video saved at:", video_path)

```
- **Default (Planes):** By default, VFusion3D outputs planes—ideal for further 3D operations.
- **Export Mesh:** Want a 3D mesh? Just set `export_mesh=True`, and you'll get a `.obj` file ready to roll. You can also customize the mesh resolution by adjusting the `mesh_size` parameter.
- **Export Video:** Fancy a 3D video? Set `export_video=True`, and you'll receive a beautifully rendered video from multiple angles. You can tweak `render_size` and `fps` to get the video just right.

Check out our [demo app](https://huggingface.co/spaces/facebook/VFusion3D) to see VFusion3D in action! 🤗

## Results and Comparisons

### 3D Generation Results
<img src='assets/gif1.gif' width=950>

<img src='assets/gif2.gif' width=950>

### User Study Results
<img src='assets/user.png' width=950>



## Acknowledgement

- This inference code of VFusion3D heavily borrows from [OpenLRM](https://github.com/3DTopia/OpenLRM).

## Citation

If you find this work useful, please cite us:


```
@article{han2024vfusion3d,
  title={VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models},
  author={Junlin Han and Filippos Kokkinos and Philip Torr},
  journal={European Conference on Computer Vision (ECCV)},
  year={2024}
}
```

## License

- The majority of VFusion3D is licensed under CC-BY-NC, however portions of the project are available under separate license terms: OpenLRM as a whole is licensed under the Apache License, Version 2.0, while certain components are covered by NVIDIA's proprietary license.
- The model weights of VFusion3D is also licensed under CC-BY-NC.