Update README.md
Browse files
README.md
CHANGED
@@ -25,7 +25,7 @@ This model was created by [`INSAIT`](https://insait.ai/), part of Sofia Universi
|
|
25 |
# Model description
|
26 |
|
27 |
The model was built on top of Google’s Gemma 2 9B open models.
|
28 |
-
It was continuously pre-trained on a large pre-filtered dataset using the combination of data mixing and model merging,
|
29 |
allowing the model to gain outstanding Ukrainian cultural and linguistic capabilities while retaining its English performance.
|
30 |
During the pre-training stage, we use various datasets, including Ukrainian web crawl data (FineWeb2), freely available datasets such as Wikipedia, a range of specialized Ukrainian datasets, and machine translations of popular English datasets.
|
31 |
The model was then instruction-fine-tuned on a newly constructed Ukrainian instruction dataset created using machine translations of current best English datasets and specialized Ukrainian datasets, prepared by Ukrainian community.
|
|
|
25 |
# Model description
|
26 |
|
27 |
The model was built on top of Google’s Gemma 2 9B open models.
|
28 |
+
It was continuously pre-trained on a large pre-filtered dataset (75B tokens of Ukrainian and English data in total) using the combination of data mixing and model merging,
|
29 |
allowing the model to gain outstanding Ukrainian cultural and linguistic capabilities while retaining its English performance.
|
30 |
During the pre-training stage, we use various datasets, including Ukrainian web crawl data (FineWeb2), freely available datasets such as Wikipedia, a range of specialized Ukrainian datasets, and machine translations of popular English datasets.
|
31 |
The model was then instruction-fine-tuned on a newly constructed Ukrainian instruction dataset created using machine translations of current best English datasets and specialized Ukrainian datasets, prepared by Ukrainian community.
|