Malaysian Finetuned Instruct LoRA
Collection
Continue finetuning Instruct model using LoRA from 0.5B up to 72B.
โข
16 items
โข
Updated
Continue finetuning https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct on highly curated 1.5B tokens Malaysian instruction dataset.
Finetune on mesolitica/Malaysian-SFT to make the model understand Malaysian context.
["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head"]
.Source code at https://github.com/mesolitica/malaya/tree/master/session/llama3
Based on 0-shot official MalayMMLU First token accuracy,
Model Accuracy shot by_letter category
0 Malaysian-Llama-3.2-1B-Instruct 42.325010 0shot True STEM
1 Malaysian-Llama-3.2-1B-Instruct 38.438295 0shot True Language
2 Malaysian-Llama-3.2-1B-Instruct 41.037872 0shot True Social science
3 Malaysian-Llama-3.2-1B-Instruct 44.399136 0shot True Others
4 Malaysian-Llama-3.2-1B-Instruct 42.184300 0shot True Humanities
{'Social science': 6918, 'Language': 6288, 'Humanities': 4395, 'Others': 4169, 'STEM': 2443}
Model : Malaysian-Llama-3.2-1B-Instruct
Metric : first
Shot : 0shot
average accuracy 41.2794779663817
accuracy for STEM 42.32501023331969
accuracy for Language 38.4382951653944
accuracy for Social science 41.03787221740387
accuracy for Others 44.3991364835692
accuracy for Humanities 42.184300341296925
While the original model,
Model Accuracy shot by_letter category
0 Llama-3.2-1B-Instruct 36.430618 0shot True STEM
1 Llama-3.2-1B-Instruct 37.420483 0shot True Language
2 Llama-3.2-1B-Instruct 36.773634 0shot True Social science
3 Llama-3.2-1B-Instruct 37.514992 0shot True Others
4 Llama-3.2-1B-Instruct 41.319681 0shot True Humanities
{'Social science': 6918, 'Language': 6288, 'Humanities': 4395, 'Others': 4169, 'STEM': 2443}
Model : Llama-3.2-1B-Instruct
Metric : first
Shot : 0shot
average accuracy 37.85982736546483
accuracy for STEM 36.43061809250921
accuracy for Language 37.420483460559794
accuracy for Social science 36.773633998265396
accuracy for Others 37.51499160470137
accuracy for Humanities 41.31968145620023
Based on 0-shot exact first token match using vLLM Guided Decoding,
Model Accuracy shot category
0 Malaysian-Llama-3.2-1B-Instruct 39.869014 0 STEM
1 Malaysian-Llama-3.2-1B-Instruct 39.662850 0 Language
2 Malaysian-Llama-3.2-1B-Instruct 41.211333 0 Social science
3 Malaysian-Llama-3.2-1B-Instruct 42.432238 0 Others
4 Malaysian-Llama-3.2-1B-Instruct 46.029579 0 Humanities
Model : Malaysian-Llama-3.2-1B-Instruct
Metric : full
Shot : 0
average accuracy 41.7585594515343
accuracy for STEM 39.86901350798199
accuracy for Language 39.662849872773535
accuracy for Social science 41.211332755131544
accuracy for Others 42.432237946749815
accuracy for Humanities 46.02957906712173
While the original model,
Model Accuracy shot category
0 Llama-3.2-1B-Instruct 36.553418 0 STEM
1 Llama-3.2-1B-Instruct 32.395038 0 Language
2 Llama-3.2-1B-Instruct 38.493784 0 Social science
3 Llama-3.2-1B-Instruct 39.002159 0 Others
4 Llama-3.2-1B-Instruct 38.748578 0 Humanities
Model : Llama-3.2-1B-Instruct
Metric : full
Shot : 0
average accuracy 36.84797422872011
accuracy for STEM 36.55341792877609
accuracy for Language 32.395038167938935
accuracy for Social science 38.49378433073142
accuracy for Others 39.002158791076994
accuracy for Humanities 38.7485779294653
Special thanks to https://www.sns.com.my for 8x H100 node!