Update README.md
Browse files
README.md
CHANGED
@@ -30,6 +30,7 @@ A 31M model trained on 100M (10M unique words) able to do both causal and masked
|
|
30 |
- [Testing Data & Metrics](#testing-data-factors--metrics)
|
31 |
- [Testing Data](#testing-data)
|
32 |
- [Metrics](#metrics)
|
|
|
33 |
- [Results](#results)
|
34 |
- [Technical Specifications](#technical-specifications-optional)
|
35 |
- [Model Architecture and Objective](#model-architecture-and-objective)
|
@@ -192,6 +193,19 @@ The metrics used to evaluate the model are the following:
|
|
192 |
|
193 |
The metrics were chosen based on the advice of the papers the tasks come from.
|
194 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
195 |
## Results
|
196 |
|
197 |
*Validation (Loss)*
|
|
|
30 |
- [Testing Data & Metrics](#testing-data-factors--metrics)
|
31 |
- [Testing Data](#testing-data)
|
32 |
- [Metrics](#metrics)
|
33 |
+
- [Hyperparameters](#hyperparameters)
|
34 |
- [Results](#results)
|
35 |
- [Technical Specifications](#technical-specifications-optional)
|
36 |
- [Model Architecture and Objective](#model-architecture-and-objective)
|
|
|
193 |
|
194 |
The metrics were chosen based on the advice of the papers the tasks come from.
|
195 |
|
196 |
+
### Hyperparameters
|
197 |
+
|
198 |
+
| Hyperparameter | MNLI, RTE, QQP, MRPC | BoolQ, MultiRC | WSC |
|
199 |
+
| --- | --- | --- | --- |
|
200 |
+
| Learning Rate | 3\*10<sup>-5</sup> | 3\*10<sup>-5</sup> | 3\*10<sup>-5</sup> |
|
201 |
+
| Batch Size | 32 | 16 | 32 |
|
202 |
+
| Epochs | 10 | 10 | 30 |
|
203 |
+
| Weight decay | 0.01 | 0.01 | 0.01 |
|
204 |
+
| Optimizer | AdamW | AdamW | AdamW |
|
205 |
+
| Scheduler | cosine | cosine | cosine |
|
206 |
+
| Warmup percentage | 6% | 6% | 6% |
|
207 |
+
| Dropout | 0.1 | 0.1 | 0.1 |
|
208 |
+
|
209 |
## Results
|
210 |
|
211 |
*Validation (Loss)*
|