Upload ./README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -29,15 +29,9 @@ python utils/inference.py
|
|
29 |
|
30 |
## Training
|
31 |
|
32 |
-
The model was trained with the [Human Segmentation Dataset](https://huggingface.co/datasets/schirrmacher/humans).
|
33 |
|
34 |
-
|
35 |
-
|
36 |
-
- Training Time: 8 hours
|
37 |
-
- Training Loss: 0.1179
|
38 |
-
- Validation Loss: 0.1284
|
39 |
-
- Maximum F1 Score: 0.9928
|
40 |
-
- Mean Absolute Error: 0.005
|
41 |
|
42 |
Output model: `/models/ormbg.pth`.
|
43 |
|
@@ -81,25 +75,9 @@ python utils/pth_to_onnx.py
|
|
81 |
|
82 |
# Research
|
83 |
|
84 |
-
Synthetic datasets have limitations for achieving great segmentation results. This is because artificial lighting, occlusion, scale or backgrounds create a gap between synthetic and real images. A "model trained solely on synthetic data generated with naïve domain randomization struggles to generalize on the real domain", see [PEOPLESANSPEOPLE: A Synthetic Data Generator for Human-Centric Computer Vision (2022)](https://arxiv.org/pdf/2112.09290).
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
Currently I am doing research how to close this gap with the resources I have.
|
89 |
-
|
90 |
-
- Apply ControlNet with LayerDiffuse for creating segmented humans in nearly realistic environments
|
91 |
-
- ✅ very promising
|
92 |
-
- ❌ ControlNet adds hairs and body parts occasionally and extends the segmented area which might confuse the model
|
93 |
-
- Apply LayerDiffuse for foreground and background creation
|
94 |
-
- ✅ easy to perform
|
95 |
-
- ❌ quality not optimal, every 20th image usable (yet)
|
96 |
-
- Consider pose estimations as additional training data, see [Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation (2019)](https://arxiv.org/pdf/1907.05193).
|
97 |
-
- Create 3D models of humans
|
98 |
-
- ❌ hair styles, clothes and environment difficult to generate in a realistic way and also a lot of work
|
99 |
-
- Segment photos with commercial tools and use them for training
|
100 |
-
- ❌ costs, model never better than competitors
|
101 |
-
- Manually segment photos
|
102 |
-
- ❌ huge amount of work
|
103 |
|
104 |
## Support
|
105 |
|
|
|
29 |
|
30 |
## Training
|
31 |
|
32 |
+
The model was trained with the [Human Segmentation Dataset](https://huggingface.co/datasets/schirrmacher/humans) which was created with [LayerDiffuse](https://github.com/layerdiffusion/LayerDiffuse) and [IC-Light](https://github.com/lllyasviel/IC-Light).
|
33 |
|
34 |
+
The model was trained with 10.000 iterations and on a NVIDIA GeForce RTX 4090.
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
Output model: `/models/ormbg.pth`.
|
37 |
|
|
|
75 |
|
76 |
# Research
|
77 |
|
78 |
+
Synthetic datasets have limitations for achieving great segmentation results. This is because artificial lighting, occlusion, scale or backgrounds create a gap between synthetic and real images. A "model trained solely on synthetic data generated with naïve domain randomization struggles to generalize on the real domain", see [PEOPLESANSPEOPLE: A Synthetic Data Generator for Human-Centric Computer Vision (2022)](https://arxiv.org/pdf/2112.09290).
|
79 |
+
|
80 |
+
Currently I am doing research how to close this gap. Latest research is about creating segmented humans with [LayerDiffuse](https://github.com/layerdiffusion/LayerDiffuse) and then apply [IC-Light](https://github.com/lllyasviel/IC-Light) for creating realistic light effects and shadows.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
81 |
|
82 |
## Support
|
83 |
|