BabyLM-community
/

babylm-baseline-10m-gpt-bert-mixed

babylm-baseline

Model card Files Files and versions Community

lgcharpe commited on 5 days ago

Commit

929e896

·

verified ·

1 Parent(s): 21a00fc

Update README.md

Files changed (1) hide show

README.md +14 -0

README.md CHANGED Viewed

@@ -30,6 +30,7 @@ A 31M model trained on 100M (10M unique words) able to do both causal and masked
   - [Testing Data & Metrics](#testing-data-factors--metrics)
     - [Testing Data](#testing-data)
     - [Metrics](#metrics)
   - [Results](#results)
 - [Technical Specifications](#technical-specifications-optional)
   - [Model Architecture and Objective](#model-architecture-and-objective)
@@ -192,6 +193,19 @@ The metrics used to evaluate the model are the following:
 The metrics were chosen based on the advice of the papers the tasks come from.
 ## Results
 *Validation (Loss)*

   - [Testing Data & Metrics](#testing-data-factors--metrics)
     - [Testing Data](#testing-data)
     - [Metrics](#metrics)
+    - [Hyperparameters](#hyperparameters)
   - [Results](#results)
 - [Technical Specifications](#technical-specifications-optional)
   - [Model Architecture and Objective](#model-architecture-and-objective)
 The metrics were chosen based on the advice of the papers the tasks come from.
+### Hyperparameters
+| Hyperparameter | MNLI, RTE, QQP, MRPC | BoolQ, MultiRC | WSC |
+| --- | --- | --- | --- |
+| Learning Rate | 3\*10<sup>-5</sup> | 3\*10<sup>-5</sup> | 3\*10<sup>-5</sup> |
+| Batch Size | 32 | 16 | 32 |
+| Epochs | 10 | 10 | 30 |
+| Weight decay | 0.01 | 0.01 | 0.01 |
+| Optimizer | AdamW | AdamW | AdamW |
+| Scheduler | cosine | cosine | cosine |
+| Warmup percentage | 6% | 6% | 6% |
+| Dropout | 0.1 | 0.1 | 0.1 |
 ## Results
 *Validation (Loss)*