Update README.md
Browse files
README.md
CHANGED
@@ -105,6 +105,55 @@ The EuroBERT family exhibits strong multilingual performance across domains and
|
|
105 |
<img src="img/long_context.png" width="100%" alt="EuroBERT" />
|
106 |
</div>
|
107 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
108 |
|
109 |
## License
|
110 |
|
|
|
105 |
<img src="img/long_context.png" width="100%" alt="EuroBERT" />
|
106 |
</div>
|
107 |
|
108 |
+
### Suggested Fine-Tuning Hyperparameters
|
109 |
+
|
110 |
+
If you plan to fine-tune this model on some downstream tasks, you can follow the hyperparameters we found in our paper.
|
111 |
+
|
112 |
+
#### Base Hyperparameters (unchanged across tasks)
|
113 |
+
|
114 |
+
- Warmup Ratio: 0.1
|
115 |
+
- Learning Rate Scheduler: Linear
|
116 |
+
- Adam Beta 1: 0.9
|
117 |
+
- Adam Beta 2: 0.95
|
118 |
+
- Adam Epsilon: 1e-5
|
119 |
+
- Weight Decay: 0.1
|
120 |
+
|
121 |
+
#### Task-Specific Learning Rates
|
122 |
+
|
123 |
+
##### Sequence Classification
|
124 |
+
|
125 |
+
| Dataset | EuroBERT-210m | EuroBERT-610m | EuroBERT-2.1B |
|
126 |
+
|--------------------------------------|----------------|----------------|----------------|
|
127 |
+
| XNLI | 3.6e-05 | 3.6e-05 | 2.8e-05 |
|
128 |
+
| PAWS-X | 3.6e-05 | 4.6e-05 | 3.6e-05 |
|
129 |
+
| QAM | 3.6e-05 | 2.8e-05 | 2.2e-05 |
|
130 |
+
| AmazonReviews | 3.6e-05 | 2.8e-05 | 3.6e-05 |
|
131 |
+
| MassiveIntent | 6.0e-05 | 4.6e-05 | 2.8e-05 |
|
132 |
+
| CodeDefect | 3.6e-05 | 2.8e-05 | 1.3e-05 |
|
133 |
+
| CodeComplexity | 3.6e-05 | 3.6e-05 | 1.0e-05 |
|
134 |
+
| MathShepherd | 7.7e-05 | 2.8e-05 | 1.7e-05 |
|
135 |
+
|
136 |
+
##### Sequence Regression
|
137 |
+
|
138 |
+
| Dataset | EuroBERT-210m | EuroBERT-610m | EuroBERT-2.1B |
|
139 |
+
|--------------------------|----------------|----------------|----------------|
|
140 |
+
| SeaHorse | 3.6e-05 | 3.6e-05 | 2.8e-05 |
|
141 |
+
| SummevalMultilingual | 3.6e-05 | 2.8e-05 | 3.6e-05 |
|
142 |
+
| WMT | 2.8e-05 | 2.8e-05 | 1.3e-05 |
|
143 |
+
|
144 |
+
##### Retrieval
|
145 |
+
|
146 |
+
| Dataset | EuroBERT-210m | EuroBERT-610m | EuroBERT-2.1B |
|
147 |
+
|-----------------------------------------|----------------|----------------|----------------|
|
148 |
+
| MIRACL | 4.6e-05 | 3.6e-05 | 2.8e-05 |
|
149 |
+
| MLDR | 2.8e-05 | 2.2e-05 | 4.6e-05 |
|
150 |
+
| CC-News | 4.6e-05 | 4.6e-05 | 3.6e-05 |
|
151 |
+
| Wikipedia | 2.8e-05 | 3.6e-05 | 2.8e-05 |
|
152 |
+
| CodeSearchNet | 4.6e-05 | 2.8e-05 | 3.6e-05 |
|
153 |
+
| CqaDupStackMath | 4.6e-05 | 2.8e-05 | 3.6e-05 |
|
154 |
+
| MathFormula | 1.7e-05 | 3.6e-05 | 3.6e-05 |
|
155 |
+
|
156 |
+
**Disclaimer**: These are suggested hyperparameters based on our experiments. We recommend conducting your own grid search for best results on your specific downstream task.
|
157 |
|
158 |
## License
|
159 |
|