hgissbkh commited on
Commit
5bc22de
·
verified ·
1 Parent(s): 5809bf1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -0
README.md CHANGED
@@ -105,6 +105,55 @@ The EuroBERT family exhibits strong multilingual performance across domains and
105
  <img src="img/long_context.png" width="100%" alt="EuroBERT" />
106
  </div>
107
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
 
109
  ## License
110
 
 
105
  <img src="img/long_context.png" width="100%" alt="EuroBERT" />
106
  </div>
107
 
108
+ ### Suggested Fine-Tuning Hyperparameters
109
+
110
+ If you plan to fine-tune this model on some downstream tasks, you can follow the hyperparameters we found in our paper.
111
+
112
+ #### Base Hyperparameters (unchanged across tasks)
113
+
114
+ - Warmup Ratio: 0.1
115
+ - Learning Rate Scheduler: Linear
116
+ - Adam Beta 1: 0.9
117
+ - Adam Beta 2: 0.95
118
+ - Adam Epsilon: 1e-5
119
+ - Weight Decay: 0.1
120
+
121
+ #### Task-Specific Learning Rates
122
+
123
+ ##### Sequence Classification
124
+
125
+ | Dataset | EuroBERT-210m | EuroBERT-610m | EuroBERT-2.1B |
126
+ |--------------------------------------|----------------|----------------|----------------|
127
+ | XNLI | 3.6e-05 | 3.6e-05 | 2.8e-05 |
128
+ | PAWS-X | 3.6e-05 | 4.6e-05 | 3.6e-05 |
129
+ | QAM | 3.6e-05 | 2.8e-05 | 2.2e-05 |
130
+ | AmazonReviews | 3.6e-05 | 2.8e-05 | 3.6e-05 |
131
+ | MassiveIntent | 6.0e-05 | 4.6e-05 | 2.8e-05 |
132
+ | CodeDefect | 3.6e-05 | 2.8e-05 | 1.3e-05 |
133
+ | CodeComplexity | 3.6e-05 | 3.6e-05 | 1.0e-05 |
134
+ | MathShepherd | 7.7e-05 | 2.8e-05 | 1.7e-05 |
135
+
136
+ ##### Sequence Regression
137
+
138
+ | Dataset | EuroBERT-210m | EuroBERT-610m | EuroBERT-2.1B |
139
+ |--------------------------|----------------|----------------|----------------|
140
+ | SeaHorse | 3.6e-05 | 3.6e-05 | 2.8e-05 |
141
+ | SummevalMultilingual | 3.6e-05 | 2.8e-05 | 3.6e-05 |
142
+ | WMT | 2.8e-05 | 2.8e-05 | 1.3e-05 |
143
+
144
+ ##### Retrieval
145
+
146
+ | Dataset | EuroBERT-210m | EuroBERT-610m | EuroBERT-2.1B |
147
+ |-----------------------------------------|----------------|----------------|----------------|
148
+ | MIRACL | 4.6e-05 | 3.6e-05 | 2.8e-05 |
149
+ | MLDR | 2.8e-05 | 2.2e-05 | 4.6e-05 |
150
+ | CC-News | 4.6e-05 | 4.6e-05 | 3.6e-05 |
151
+ | Wikipedia | 2.8e-05 | 3.6e-05 | 2.8e-05 |
152
+ | CodeSearchNet | 4.6e-05 | 2.8e-05 | 3.6e-05 |
153
+ | CqaDupStackMath | 4.6e-05 | 2.8e-05 | 3.6e-05 |
154
+ | MathFormula | 1.7e-05 | 3.6e-05 | 3.6e-05 |
155
+
156
+ **Disclaimer**: These are suggested hyperparameters based on our experiments. We recommend conducting your own grid search for best results on your specific downstream task.
157
 
158
  ## License
159