hannayukhymenko commited on
Commit
ca7c15c
·
verified ·
1 Parent(s): a10ab51

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -25,7 +25,7 @@ This model was created by [`INSAIT`](https://insait.ai/), part of Sofia Universi
25
  # Model description
26
 
27
  The model was built on top of Google’s Gemma 2 9B open models.
28
- It was continuously pre-trained on a large pre-filtered dataset using the combination of data mixing and model merging,
29
  allowing the model to gain outstanding Ukrainian cultural and linguistic capabilities while retaining its English performance.
30
  During the pre-training stage, we use various datasets, including Ukrainian web crawl data (FineWeb2), freely available datasets such as Wikipedia, a range of specialized Ukrainian datasets, and machine translations of popular English datasets.
31
  The model was then instruction-fine-tuned on a newly constructed Ukrainian instruction dataset created using machine translations of current best English datasets and specialized Ukrainian datasets, prepared by Ukrainian community.
 
25
  # Model description
26
 
27
  The model was built on top of Google’s Gemma 2 9B open models.
28
+ It was continuously pre-trained on a large pre-filtered dataset (75B tokens of Ukrainian and English data in total) using the combination of data mixing and model merging,
29
  allowing the model to gain outstanding Ukrainian cultural and linguistic capabilities while retaining its English performance.
30
  During the pre-training stage, we use various datasets, including Ukrainian web crawl data (FineWeb2), freely available datasets such as Wikipedia, a range of specialized Ukrainian datasets, and machine translations of popular English datasets.
31
  The model was then instruction-fine-tuned on a newly constructed Ukrainian instruction dataset created using machine translations of current best English datasets and specialized Ukrainian datasets, prepared by Ukrainian community.