Update README.md
Browse files
README.md
CHANGED
|
@@ -75,25 +75,6 @@ This pipeline can be broken up into three key steps:
|
|
| 75 |
|
| 76 |
|
| 77 |
|
| 78 |
-
## Why did we choose DeepSpeed?
|
| 79 |
-
|
| 80 |
-
**DeepSpeed Training:**
|
| 81 |
-
|
| 82 |
-
The `main.py` Python code take the DeepSpeed config with the argument `--deepspeed_config ./ds_config.json`.
|
| 83 |
-
|
| 84 |
-
We read up on the DeepSpeed documentation and created a specific coniguration based on their work. The json file `ds_config.json` here is set to take the [ZeRO-2](https://www.microsoft.com/en-us/research/blog/ZeRO-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/) stage and FP16, allowing must faster training and GPU memory saving. Note that ZeRO-2 is just one of the examples using our DeepSpeed. You may use ZeRO-1, Zero-3, ZeRO-Offload and ZeRO-infinity. For more information on DeepSpeed ZeRO family, please see this [tutorial link](https://www.deepspeed.ai/tutorials/zero/) for Zero-1/2/3 and this [tutorial ](https://www.deepspeed.ai/tutorials/zero-offload/)for Zero-Offload.
|
| 85 |
-
|
| 86 |
-
To enable the DeepSpeed Zero family training, we injected several lines of code in order to enable this i.e.:
|
| 87 |
-
|
| 88 |
-
```python
|
| 89 |
-
model, optimizer, _, lr_scheduler = deepspeed.initialize(model=model, \
|
| 90 |
-
optimizer=optimizer, \
|
| 91 |
-
args=args, \
|
| 92 |
-
lr_scheduler=lr_scheduler, \
|
| 93 |
-
dist_init_required=True)
|
| 94 |
-
```
|
| 95 |
-
|
| 96 |
-
|
| 97 |
## **Acknowledgements**
|
| 98 |
|
| 99 |
We thank the following papers and open-source repositories. We especially thank DeepSpeed for their frameworks as well.
|
|
|
|
| 75 |
|
| 76 |
|
| 77 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
## **Acknowledgements**
|
| 79 |
|
| 80 |
We thank the following papers and open-source repositories. We especially thank DeepSpeed for their frameworks as well.
|