Malaysian gemma-3-27b-it

Continue finetuning https://huggingface.co/google/gemma-3-27b-it on highly curated 1.5B tokens Malaysian instruction dataset.

Improvement

  1. Support respond in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
  2. Able to code in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
  3. Multi-turn Malaysian context such as related to Malaysian Legislation, politics, religions and languages.

Training session

Finetune on mesolitica/Malaysian-SFT to make the model understand Malaysian context.

How we train

  1. LoRA on ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head"].
  2. 128 Rank with alpha 256, or alpha of 2.0
  3. Multipacking 8192 context length with proper SDPA causal masking to prevent document contamination and also make sure proper position ids.
  4. Chunk CCE loss for LoRA.
  5. WanDB at https://wandb.ai/huseinzol05/lora-embedding-128-gemma3-27b-malaysian-8k?nw=nwuserhuseinzol05

Source code at https://github.com/mesolitica/malaya/tree/master/session/gemma3

Benchmark

MalayMMLU

Based on 0-shot exact first token match using vLLM,

                      Model   Accuracy  shot        category
0  Malaysian-gemma-3-27b-it  72.697503     0            STEM
1  Malaysian-gemma-3-27b-it  76.781170     0        Language
2  Malaysian-gemma-3-27b-it  68.227812     0  Social science
3  Malaysian-gemma-3-27b-it  68.385704     0          Others
4  Malaysian-gemma-3-27b-it  71.535836     0      Humanities
Model : Malaysian-gemma-3-27b-it
Metric : full
Shot : 0
average accuracy 71.52769173584439
accuracy for STEM 72.6975030699959
accuracy for Language 76.78117048346056
accuracy for Social science 68.22781150621567
accuracy for Others 68.38570400575678
accuracy for Humanities 71.5358361774744

Currently the original model not able to use guided decoding in vLLM.

Acknowledgement

Special thanks to https://www.sns.com.my for 8x H100 node!

Downloads last month
10
Safetensors
Model size
27B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mesolitica/Malaysian-gemma-3-27b-it

Quantizations
2 models

Collection including mesolitica/Malaysian-gemma-3-27b-it