lgcharpe commited on
Commit
929e896
·
verified ·
1 Parent(s): 21a00fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -30,6 +30,7 @@ A 31M model trained on 100M (10M unique words) able to do both causal and masked
30
  - [Testing Data & Metrics](#testing-data-factors--metrics)
31
  - [Testing Data](#testing-data)
32
  - [Metrics](#metrics)
 
33
  - [Results](#results)
34
  - [Technical Specifications](#technical-specifications-optional)
35
  - [Model Architecture and Objective](#model-architecture-and-objective)
@@ -192,6 +193,19 @@ The metrics used to evaluate the model are the following:
192
 
193
  The metrics were chosen based on the advice of the papers the tasks come from.
194
 
 
 
 
 
 
 
 
 
 
 
 
 
 
195
  ## Results
196
 
197
  *Validation (Loss)*
 
30
  - [Testing Data & Metrics](#testing-data-factors--metrics)
31
  - [Testing Data](#testing-data)
32
  - [Metrics](#metrics)
33
+ - [Hyperparameters](#hyperparameters)
34
  - [Results](#results)
35
  - [Technical Specifications](#technical-specifications-optional)
36
  - [Model Architecture and Objective](#model-architecture-and-objective)
 
193
 
194
  The metrics were chosen based on the advice of the papers the tasks come from.
195
 
196
+ ### Hyperparameters
197
+
198
+ | Hyperparameter | MNLI, RTE, QQP, MRPC | BoolQ, MultiRC | WSC |
199
+ | --- | --- | --- | --- |
200
+ | Learning Rate | 3\*10<sup>-5</sup> | 3\*10<sup>-5</sup> | 3\*10<sup>-5</sup> |
201
+ | Batch Size | 32 | 16 | 32 |
202
+ | Epochs | 10 | 10 | 30 |
203
+ | Weight decay | 0.01 | 0.01 | 0.01 |
204
+ | Optimizer | AdamW | AdamW | AdamW |
205
+ | Scheduler | cosine | cosine | cosine |
206
+ | Warmup percentage | 6% | 6% | 6% |
207
+ | Dropout | 0.1 | 0.1 | 0.1 |
208
+
209
  ## Results
210
 
211
  *Validation (Loss)*