Update README.md
Browse files
README.md
CHANGED
@@ -21,10 +21,10 @@ base_model: google/gemma-2-9b
|
|
21 |
<img src="gemma_2_9b_sea-lion_v3_base_banner.png"/>
|
22 |
</div>
|
23 |
|
24 |
-
#
|
25 |
SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
|
26 |
|
27 |
-
|
28 |
|
29 |
SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
|
30 |
|
@@ -36,12 +36,12 @@ SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
|
|
36 |
|
37 |
## Model Details
|
38 |
### Model Description
|
39 |
-
We performed continued pre-training in English and ASEAN languages on [Gemma-2-9B](https://huggingface.co/google/gemma-2-9b), a decoder model using the Gemma 2 architecture, to create
|
40 |
|
41 |
For tokenisation, the model employs the default tokenizer used in Gemma 2 9B.
|
42 |
|
43 |
### Benchmark Performance
|
44 |
-
We evaluated
|
45 |
|
46 |
#### General Language Capabilities
|
47 |
For the evaluation of general language capabilities, we employed the [SEA HELM (also known as BHASA) evaluation benchmark](https://arxiv.org/abs/2309.06085v2) across a variety of tasks.
|
@@ -51,20 +51,20 @@ Note: SEA HELM is implemented using prompts to elicit answers in a strict format
|
|
51 |
|
52 |
The evaluation was done **five-shot** with native prompts on a sample of 100-1000 instances for each dataset.
|
53 |
|
54 |
-
For more details on
|
55 |
|
56 |
## Technical Specifications
|
57 |
### Infrastructure
|
58 |
-
|
59 |
|
60 |
-
| Training Details |
|
61 |
|----------------------|:------------------------:|
|
62 |
| SingTel HGX-100 | 8 instances |
|
63 |
| Nvidia H100 80GB GPU | 64 |
|
64 |
| Training Duration | 10 days |
|
65 |
|
66 |
### Configuration
|
67 |
-
| HyperParameter |
|
68 |
|-------------------|:------------------------:|
|
69 |
| Precision | bfloat16 |
|
70 |
| Optimizer | decoupled_adamw |
|
@@ -74,7 +74,7 @@ Gemma2 9B CPT SEA-LIONv3 was trained using [MosaicML Composer](https://github.co
|
|
74 |
| Micro Batch Size | 1 |
|
75 |
|
76 |
## Data
|
77 |
-
|
78 |
|
79 |
| Language | Source | Total Tokens (B) | Percentage (%) | Total percentage (%) |
|
80 |
| ------------------------ | ---------------- | ---------------- | -------------- | -------------------- |
|
|
|
21 |
<img src="gemma_2_9b_sea-lion_v3_base_banner.png"/>
|
22 |
</div>
|
23 |
|
24 |
+
# Gemma-SEA-LION-v3-9B
|
25 |
SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
|
26 |
|
27 |
+
Gemma-SEA-LION-v3-9B is a multilingual model which has undergone continued pre-training on approximately **200B** tokens across the 11 official Southeast Asian languages: English, Chinese, Vietnamese, Indonesian, Thai, Tamil, Filipino, Malay, Khmer, Lao, Burmese.
|
28 |
|
29 |
SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
|
30 |
|
|
|
36 |
|
37 |
## Model Details
|
38 |
### Model Description
|
39 |
+
We performed continued pre-training in English and ASEAN languages on [Gemma-2-9B](https://huggingface.co/google/gemma-2-9b), a decoder model using the Gemma 2 architecture, to create Gemma-SEA-LION-v3-9B.
|
40 |
|
41 |
For tokenisation, the model employs the default tokenizer used in Gemma 2 9B.
|
42 |
|
43 |
### Benchmark Performance
|
44 |
+
We evaluated Gemma-SEA-LION-v3-9B on general language capabilities.
|
45 |
|
46 |
#### General Language Capabilities
|
47 |
For the evaluation of general language capabilities, we employed the [SEA HELM (also known as BHASA) evaluation benchmark](https://arxiv.org/abs/2309.06085v2) across a variety of tasks.
|
|
|
51 |
|
52 |
The evaluation was done **five-shot** with native prompts on a sample of 100-1000 instances for each dataset.
|
53 |
|
54 |
+
For more details on Gemma-SEA-LION-v3-9B benchmark performance, please refer to the SEA HELM leaderboard, https://leaderboard.sea-lion.ai/
|
55 |
|
56 |
## Technical Specifications
|
57 |
### Infrastructure
|
58 |
+
Gemma-SEA-LION-v3-9B was trained using [MosaicML Composer](https://github.com/mosaicml/composer) on the following hardware:
|
59 |
|
60 |
+
| Training Details | Gemma-SEA-LION-v3-9B |
|
61 |
|----------------------|:------------------------:|
|
62 |
| SingTel HGX-100 | 8 instances |
|
63 |
| Nvidia H100 80GB GPU | 64 |
|
64 |
| Training Duration | 10 days |
|
65 |
|
66 |
### Configuration
|
67 |
+
| HyperParameter | Gemma-SEA-LION-v3-9B |
|
68 |
|-------------------|:------------------------:|
|
69 |
| Precision | bfloat16 |
|
70 |
| Optimizer | decoupled_adamw |
|
|
|
74 |
| Micro Batch Size | 1 |
|
75 |
|
76 |
## Data
|
77 |
+
Gemma-SEA-LION-v3-9B was continued pre-trained on 200B tokens of the following data:
|
78 |
|
79 |
| Language | Source | Total Tokens (B) | Percentage (%) | Total percentage (%) |
|
80 |
| ------------------------ | ---------------- | ---------------- | -------------- | -------------------- |
|