aisingapore
/

Gemma-SEA-LION-v3-9B

Text Generation

Transformers

Safetensors

gemma2

text-generation-inference

Model card Files Files and versions Community

chtxxxxx commited on Apr 14

Commit

ab7c604

verified ·

1 Parent(s): 2eb07f9

Update README.md

Browse files

Files changed (1) hide show

README.md +9 -9

README.md CHANGED Viewed

@@ -21,10 +21,10 @@ base_model: google/gemma-2-9b
   <img src="gemma_2_9b_sea-lion_v3_base_banner.png"/>
 </div>
-# Gemma2 9B CPT SEA-LIONv3
 SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
-Gemma2 9B CPT SEA-LIONv3 Base is a multilingual model which has undergone continued pre-training on approximately **200B** tokens across the 11 official Southeast Asian languages: English, Chinese, Vietnamese, Indonesian, Thai, Tamil, Filipino, Malay, Khmer, Lao, Burmese.
 SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
@@ -36,12 +36,12 @@ SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
 ## Model Details
 ### Model Description
-We performed continued pre-training in English and ASEAN languages on [Gemma-2-9B](https://huggingface.co/google/gemma-2-9b), a decoder model using the Gemma 2 architecture, to create Gemma2 9B CPT SEA-LIONv3 Base.
 For tokenisation, the model employs the default tokenizer used in Gemma 2 9B.
 ### Benchmark Performance
-We evaluated Gemma2 9B CPT SEA-LIONv3 base model on general language capabilities.
 #### General Language Capabilities
 For the evaluation of general language capabilities, we employed the [SEA HELM (also known as BHASA) evaluation benchmark](https://arxiv.org/abs/2309.06085v2) across a variety of tasks.
@@ -51,20 +51,20 @@ Note: SEA HELM is implemented using prompts to elicit answers in a strict format
 The evaluation was done **five-shot** with native prompts on a sample of 100-1000 instances for each dataset.
-For more details on Gemma2 9B CPT SEA-LIONv3 base benchmark performance, please refer to the SEA HELM leaderboard, https://leaderboard.sea-lion.ai/
 ## Technical Specifications
 ### Infrastructure
-Gemma2 9B CPT SEA-LIONv3 was trained using [MosaicML Composer](https://github.com/mosaicml/composer) on the following hardware:
-| Training Details     | Gemma2 9B CPT SEA-LIONv3 |
 |----------------------|:------------------------:|
 | SingTel HGX-100      |        8 instances     |
 | Nvidia H100 80GB GPU |        64              |
 | Training Duration    |        10 days           |
 ### Configuration
-| HyperParameter    | Gemma2 9B CPT SEA-LIONv3 |
 |-------------------|:------------------------:|
 | Precision         | bfloat16                 |
 | Optimizer         | decoupled_adamw          |
@@ -74,7 +74,7 @@ Gemma2 9B CPT SEA-LIONv3 was trained using [MosaicML Composer](https://github.co
 | Micro Batch Size  | 1                        |
 ## Data
-Gemma2 9B CPT SEA-LIONv3 base model was continued pre-trained on 200B tokens of the following data:
 | Language                 | Source           | Total Tokens (B) | Percentage (%) | Total percentage (%) |
 | ------------------------ | ---------------- | ---------------- | -------------- | -------------------- |

   <img src="gemma_2_9b_sea-lion_v3_base_banner.png"/>
 </div>
+# Gemma-SEA-LION-v3-9B
 SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
+Gemma-SEA-LION-v3-9B is a multilingual model which has undergone continued pre-training on approximately **200B** tokens across the 11 official Southeast Asian languages: English, Chinese, Vietnamese, Indonesian, Thai, Tamil, Filipino, Malay, Khmer, Lao, Burmese.
 SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
 ## Model Details
 ### Model Description
+We performed continued pre-training in English and ASEAN languages on [Gemma-2-9B](https://huggingface.co/google/gemma-2-9b), a decoder model using the Gemma 2 architecture, to create Gemma-SEA-LION-v3-9B.
 For tokenisation, the model employs the default tokenizer used in Gemma 2 9B.
 ### Benchmark Performance
+We evaluated Gemma-SEA-LION-v3-9B on general language capabilities.
 #### General Language Capabilities
 For the evaluation of general language capabilities, we employed the [SEA HELM (also known as BHASA) evaluation benchmark](https://arxiv.org/abs/2309.06085v2) across a variety of tasks.
 The evaluation was done **five-shot** with native prompts on a sample of 100-1000 instances for each dataset.
+For more details on Gemma-SEA-LION-v3-9B benchmark performance, please refer to the SEA HELM leaderboard, https://leaderboard.sea-lion.ai/
 ## Technical Specifications
 ### Infrastructure
+Gemma-SEA-LION-v3-9B was trained using [MosaicML Composer](https://github.com/mosaicml/composer) on the following hardware:
+| Training Details     | Gemma-SEA-LION-v3-9B |
 |----------------------|:------------------------:|
 | SingTel HGX-100      |        8 instances     |
 | Nvidia H100 80GB GPU |        64              |
 | Training Duration    |        10 days           |
 ### Configuration
+| HyperParameter    | Gemma-SEA-LION-v3-9B |
 |-------------------|:------------------------:|
 | Precision         | bfloat16                 |
 | Optimizer         | decoupled_adamw          |
 | Micro Batch Size  | 1                        |
 ## Data
+Gemma-SEA-LION-v3-9B was continued pre-trained on 200B tokens of the following data:
 | Language                 | Source           | Total Tokens (B) | Percentage (%) | Total percentage (%) |
 | ------------------------ | ---------------- | ---------------- | -------------- | -------------------- |