s-mizuki-nlp commited on
Commit
3d9fbc2
1 Parent(s): 6b9f8c1

Proofread English text.

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -121,7 +121,8 @@ The following datasets were used for continual pre-training.
121
 
122
  ### Swallow Corpus Version 2
123
 
124
- We built the Swallow Corpus by extracting high-quality Japanese texts from Common Crawl. In Version 2, we expanded the scope of the Common Crawl collection and modified the pipeline sequence to enable more flexible quality filtering. For Llama 3.1 Swallow v0.2, we further enhanced quality filtering and sampling during training compared to v0.1, resulting in the use of even higher-quality Japanese texts.
 
125
 
126
  Further details of the methodology and analysis will be provided in a forthcoming paper.
127
 
 
121
 
122
  ### Swallow Corpus Version 2
123
 
124
+ We built the Swallow Corpus by extracting high-quality Japanese texts from Common Crawl. In Version 2, we expanded the scope of the Common Crawl collection and modified the pipeline sequence to enable more flexible quality filtering.
125
+ For Llama 3.1 Swallow v0.2, we further refined our quality filtering and data sampling strategies, resulting in an even higher-quality selection of Japanese texts for pre-training.
126
 
127
  Further details of the methodology and analysis will be provided in a forthcoming paper.
128