schirrmacher
/

ormbg

@@ -29,15 +29,9 @@ python utils/inference.py
 ## Training
-The model was trained with the [Human Segmentation Dataset](https://huggingface.co/datasets/schirrmacher/humans).
-After 10.000 iterations with a single NVIDIA GeForce RTX 4090 the following achievements were made:
-- Training Time: 8 hours
-- Training Loss: 0.1179
-- Validation Loss: 0.1284
-- Maximum F1 Score: 0.9928
-- Mean Absolute Error: 0.005
 Output model: `/models/ormbg.pth`.
@@ -81,25 +75,9 @@ python utils/pth_to_onnx.py
 # Research
-Synthetic datasets have limitations for achieving great segmentation results. This is because artificial lighting, occlusion, scale or backgrounds create a gap between synthetic and real images. A "model trained solely on synthetic data generated with naïve domain randomization struggles to generalize on the real domain", see [PEOPLESANSPEOPLE: A Synthetic Data Generator for Human-Centric Computer Vision (2022)](https://arxiv.org/pdf/2112.09290). However, hybrid training approaches seem to be promising and can even improve segmentation results.
-## Ideas
-Currently I am doing research how to close this gap with the resources I have.
-- Apply ControlNet with LayerDiffuse for creating segmented humans in nearly realistic environments
-  - ✅ very promising
-  - ❌ ControlNet adds hairs and body parts occasionally and extends the segmented area which might confuse the model
-- Apply LayerDiffuse for foreground and background creation
-  - ✅ easy to perform
-  - ❌ quality not optimal, every 20th image usable (yet)
-- Consider pose estimations as additional training data, see [Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation (2019)](https://arxiv.org/pdf/1907.05193).
-- Create 3D models of humans
-  - ❌ hair styles, clothes and environment difficult to generate in a realistic way and also a lot of work
-- Segment photos with commercial tools and use them for training
-  - ❌ costs, model never better than competitors
-- Manually segment photos
-  - ❌ huge amount of work
 ## Support

 ## Training
+The model was trained with the [Human Segmentation Dataset](https://huggingface.co/datasets/schirrmacher/humans) which was created with [LayerDiffuse](https://github.com/layerdiffusion/LayerDiffuse) and [IC-Light](https://github.com/lllyasviel/IC-Light).
+The model was trained with 10.000 iterations and on a NVIDIA GeForce RTX 4090.
 Output model: `/models/ormbg.pth`.
 # Research
+Synthetic datasets have limitations for achieving great segmentation results. This is because artificial lighting, occlusion, scale or backgrounds create a gap between synthetic and real images. A "model trained solely on synthetic data generated with naïve domain randomization struggles to generalize on the real domain", see [PEOPLESANSPEOPLE: A Synthetic Data Generator for Human-Centric Computer Vision (2022)](https://arxiv.org/pdf/2112.09290).
+Currently I am doing research how to close this gap. Latest research is about creating segmented humans with [LayerDiffuse](https://github.com/layerdiffusion/LayerDiffuse) and then apply [IC-Light](https://github.com/lllyasviel/IC-Light) for creating realistic light effects and shadows.
 ## Support