Update README.md
Browse files
README.md
CHANGED
|
@@ -120,7 +120,7 @@ using ~200M tokens (ie: ~100M positive and ~100M negative) from:
|
|
| 120 |
- [jukofyork/instruction-responses-500MB](https://huggingface.co/datasets/jukofyork/instruction-responses-500MB)
|
| 121 |
- [jukofyork/instruction-refusals-500MB](https://huggingface.co/datasets/jukofyork/instruction-refusals-500MB)
|
| 122 |
|
| 123 |
-
taking just under 4 days:
|
| 124 |
|
| 125 |

|
| 126 |
|
|
@@ -128,6 +128,8 @@ taking just under 4 days:
|
|
| 128 |
|
| 129 |

|
| 130 |
|
|
|
|
|
|
|
| 131 |
---
|
| 132 |
|
| 133 |
The control adapter was then converted to a LoRA using [control_adapter_to_lora.py](https://github.com/jukofyork/qlora-pipe-lite/blob/main/control_adapter_to_lora.py):
|
|
|
|
| 120 |
- [jukofyork/instruction-responses-500MB](https://huggingface.co/datasets/jukofyork/instruction-responses-500MB)
|
| 121 |
- [jukofyork/instruction-refusals-500MB](https://huggingface.co/datasets/jukofyork/instruction-refusals-500MB)
|
| 122 |
|
| 123 |
+
taking just under 4 days using 6x `RTX A6000` over 3 machines:
|
| 124 |
|
| 125 |

|
| 126 |
|
|
|
|
| 128 |
|
| 129 |

|
| 130 |
|
| 131 |
+
(hence the 30 batch size: `(num_gpus / pipeline_stages) * gradient_accumulation_steps = (6 / 2) * 10 = 30`)
|
| 132 |
+
|
| 133 |
---
|
| 134 |
|
| 135 |
The control adapter was then converted to a LoRA using [control_adapter_to_lora.py](https://github.com/jukofyork/qlora-pipe-lite/blob/main/control_adapter_to_lora.py):
|