You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Fill in the form below to access the model:

Log in or Sign Up to review the conditions and access this model content.

Description:

AlemLLM is a large language model customized by Astana Hub to improve the helpfulness of LLM generated responses in the Kazakh language.

Evaluation Metrics

Model evaluations were conducted using established benchmarks, employing a systematic process to test performance across various cognitive and technical tasks.

Kazakh Leaderboard

Model Average MMLU Winogrande Hellaswag ARC GSM8k DROP
Yi-Lightning 0.812 0.720 0.852 0.820 0.940 0.880 0.660
DeepSeek V3 37A 0.715 0.650 0.628 0.640 0.900 0.890 0.580
DeepSeek R1 0.798 0.753 0.764 0.680 0.868 0.937 0.784
Llama-3.1-70b-inst. 0.639 0.610 0.585 0.520 0.820 0.780 0.520
KazLLM-1.0-70B 0.766 0.660 0.806 0.790 0.920 0.770 0.650
GPT-4o 0.776 0.730 0.704 0.830 0.940 0.900 0.550
AlemLLM 0.826 0.757 0.837 0.775 0.949 0.917 0.719
QwQ 32ะ’ 0.628 0.591 0.613 0.499 0.661 0.826 0.576

Russian Leaderboard

Model Average MMLU Winogrande Hellaswag ARC GSM8k DROP
Yi-Lightning 0.834 0.750 0.854 0.870 0.960 0.890 0.680
DeepSeek V3 37A 0.818 0.784 0.756 0.840 0.960 0.910 0.660
DeepSeek R1 0.845 0.838 0.811 0.827 0.972 0.928 0.694
Llama-3.1-70b-inst. 0.752 0.660 0.691 0.730 0.920 0.880 0.630
KazLLM-1.0-70B 0.748 0.650 0.806 0.860 0.790 0.810 0.570
GPT-4o 0.808 0.776 0.771 0.880 0.960 0.890 0.570
AlemLLM 0.848 0.801 0.858 0.843 0.959 0.896 0.729
QwQ 32B 0.840 0.810 0.807 0.823 0.964 0.926 0.709

English Leaderboard

Model Average MMLU Winogrande Hellaswag ARC GSM8k DROP
Yi-Lightning 0.909 0.820 0.936 0.930 0.980 0.930 0.860
DeepSeek V3 37A 0.880 0.840 0.790 0.900 0.980 0.950 0.820
DeepSeek R1 0.908 0.855 0.857 0.882 0.977 0.960 0.915
Llama-3.1-70b-inst. 0.841 0.770 0.718 0.880 0.960 0.900 0.820
KazLLM-1.0-70B 0.855 0.820 0.843 0.920 0.970 0.820 0.760
GPT-4o 0.862 0.830 0.793 0.940 0.980 0.910 0.720
AlemLLM 0.921 0.874 0.928 0.909 0.978 0.926 0.911
QwQ 32ะ’ 0.914 0.864 0.886 0.897 0.969 0.969 0.896

Model specification

Architecture: Mixture of Experts
Total Parameters: 247B
Activated Parameters: 22B
Tokenizer: SentencePiece
Quantization: BF16
Vocabulary Size: 100352
Number of Layers: 56
Activation Function: SwiGLU
Positional Encoding Method: ROPE
Optimizer: AdamW

Downloads last month
601
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support