Text Generation
Transformers
Safetensors
gemma2
text-generation-inference
chtxxxxx commited on
Commit
ab7c604
·
verified ·
1 Parent(s): 2eb07f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -21,10 +21,10 @@ base_model: google/gemma-2-9b
21
  <img src="gemma_2_9b_sea-lion_v3_base_banner.png"/>
22
  </div>
23
 
24
- # Gemma2 9B CPT SEA-LIONv3
25
  SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
26
 
27
- Gemma2 9B CPT SEA-LIONv3 Base is a multilingual model which has undergone continued pre-training on approximately **200B** tokens across the 11 official Southeast Asian languages: English, Chinese, Vietnamese, Indonesian, Thai, Tamil, Filipino, Malay, Khmer, Lao, Burmese.
28
 
29
  SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
30
 
@@ -36,12 +36,12 @@ SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
36
 
37
  ## Model Details
38
  ### Model Description
39
- We performed continued pre-training in English and ASEAN languages on [Gemma-2-9B](https://huggingface.co/google/gemma-2-9b), a decoder model using the Gemma 2 architecture, to create Gemma2 9B CPT SEA-LIONv3 Base.
40
 
41
  For tokenisation, the model employs the default tokenizer used in Gemma 2 9B.
42
 
43
  ### Benchmark Performance
44
- We evaluated Gemma2 9B CPT SEA-LIONv3 base model on general language capabilities.
45
 
46
  #### General Language Capabilities
47
  For the evaluation of general language capabilities, we employed the [SEA HELM (also known as BHASA) evaluation benchmark](https://arxiv.org/abs/2309.06085v2) across a variety of tasks.
@@ -51,20 +51,20 @@ Note: SEA HELM is implemented using prompts to elicit answers in a strict format
51
 
52
  The evaluation was done **five-shot** with native prompts on a sample of 100-1000 instances for each dataset.
53
 
54
- For more details on Gemma2 9B CPT SEA-LIONv3 base benchmark performance, please refer to the SEA HELM leaderboard, https://leaderboard.sea-lion.ai/
55
 
56
  ## Technical Specifications
57
  ### Infrastructure
58
- Gemma2 9B CPT SEA-LIONv3 was trained using [MosaicML Composer](https://github.com/mosaicml/composer) on the following hardware:
59
 
60
- | Training Details | Gemma2 9B CPT SEA-LIONv3 |
61
  |----------------------|:------------------------:|
62
  | SingTel HGX-100 | 8 instances |
63
  | Nvidia H100 80GB GPU | 64 |
64
  | Training Duration | 10 days |
65
 
66
  ### Configuration
67
- | HyperParameter | Gemma2 9B CPT SEA-LIONv3 |
68
  |-------------------|:------------------------:|
69
  | Precision | bfloat16 |
70
  | Optimizer | decoupled_adamw |
@@ -74,7 +74,7 @@ Gemma2 9B CPT SEA-LIONv3 was trained using [MosaicML Composer](https://github.co
74
  | Micro Batch Size | 1 |
75
 
76
  ## Data
77
- Gemma2 9B CPT SEA-LIONv3 base model was continued pre-trained on 200B tokens of the following data:
78
 
79
  | Language | Source | Total Tokens (B) | Percentage (%) | Total percentage (%) |
80
  | ------------------------ | ---------------- | ---------------- | -------------- | -------------------- |
 
21
  <img src="gemma_2_9b_sea-lion_v3_base_banner.png"/>
22
  </div>
23
 
24
+ # Gemma-SEA-LION-v3-9B
25
  SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
26
 
27
+ Gemma-SEA-LION-v3-9B is a multilingual model which has undergone continued pre-training on approximately **200B** tokens across the 11 official Southeast Asian languages: English, Chinese, Vietnamese, Indonesian, Thai, Tamil, Filipino, Malay, Khmer, Lao, Burmese.
28
 
29
  SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
30
 
 
36
 
37
  ## Model Details
38
  ### Model Description
39
+ We performed continued pre-training in English and ASEAN languages on [Gemma-2-9B](https://huggingface.co/google/gemma-2-9b), a decoder model using the Gemma 2 architecture, to create Gemma-SEA-LION-v3-9B.
40
 
41
  For tokenisation, the model employs the default tokenizer used in Gemma 2 9B.
42
 
43
  ### Benchmark Performance
44
+ We evaluated Gemma-SEA-LION-v3-9B on general language capabilities.
45
 
46
  #### General Language Capabilities
47
  For the evaluation of general language capabilities, we employed the [SEA HELM (also known as BHASA) evaluation benchmark](https://arxiv.org/abs/2309.06085v2) across a variety of tasks.
 
51
 
52
  The evaluation was done **five-shot** with native prompts on a sample of 100-1000 instances for each dataset.
53
 
54
+ For more details on Gemma-SEA-LION-v3-9B benchmark performance, please refer to the SEA HELM leaderboard, https://leaderboard.sea-lion.ai/
55
 
56
  ## Technical Specifications
57
  ### Infrastructure
58
+ Gemma-SEA-LION-v3-9B was trained using [MosaicML Composer](https://github.com/mosaicml/composer) on the following hardware:
59
 
60
+ | Training Details | Gemma-SEA-LION-v3-9B |
61
  |----------------------|:------------------------:|
62
  | SingTel HGX-100 | 8 instances |
63
  | Nvidia H100 80GB GPU | 64 |
64
  | Training Duration | 10 days |
65
 
66
  ### Configuration
67
+ | HyperParameter | Gemma-SEA-LION-v3-9B |
68
  |-------------------|:------------------------:|
69
  | Precision | bfloat16 |
70
  | Optimizer | decoupled_adamw |
 
74
  | Micro Batch Size | 1 |
75
 
76
  ## Data
77
+ Gemma-SEA-LION-v3-9B was continued pre-trained on 200B tokens of the following data:
78
 
79
  | Language | Source | Total Tokens (B) | Percentage (%) | Total percentage (%) |
80
  | ------------------------ | ---------------- | ---------------- | -------------- | -------------------- |