You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
Fill in the form below to access the model:
Log in or Sign Up to review the conditions and access this model content.
Description:
AlemLLM is a large language model customized by Astana Hub to improve the helpfulness of LLM generated responses in the Kazakh language.
Evaluation Metrics
Model evaluations were conducted using established benchmarks, employing a systematic process to test performance across various cognitive and technical tasks.
Kazakh Leaderboard
Model | Average | MMLU | Winogrande | Hellaswag | ARC | GSM8k | DROP |
---|---|---|---|---|---|---|---|
Yi-Lightning | 0.812 | 0.720 | 0.852 | 0.820 | 0.940 | 0.880 | 0.660 |
DeepSeek V3 37A | 0.715 | 0.650 | 0.628 | 0.640 | 0.900 | 0.890 | 0.580 |
DeepSeek R1 | 0.798 | 0.753 | 0.764 | 0.680 | 0.868 | 0.937 | 0.784 |
Llama-3.1-70b-inst. | 0.639 | 0.610 | 0.585 | 0.520 | 0.820 | 0.780 | 0.520 |
KazLLM-1.0-70B | 0.766 | 0.660 | 0.806 | 0.790 | 0.920 | 0.770 | 0.650 |
GPT-4o | 0.776 | 0.730 | 0.704 | 0.830 | 0.940 | 0.900 | 0.550 |
AlemLLM | 0.826 | 0.757 | 0.837 | 0.775 | 0.949 | 0.917 | 0.719 |
QwQ 32ะ | 0.628 | 0.591 | 0.613 | 0.499 | 0.661 | 0.826 | 0.576 |
Russian Leaderboard
Model | Average | MMLU | Winogrande | Hellaswag | ARC | GSM8k | DROP |
---|---|---|---|---|---|---|---|
Yi-Lightning | 0.834 | 0.750 | 0.854 | 0.870 | 0.960 | 0.890 | 0.680 |
DeepSeek V3 37A | 0.818 | 0.784 | 0.756 | 0.840 | 0.960 | 0.910 | 0.660 |
DeepSeek R1 | 0.845 | 0.838 | 0.811 | 0.827 | 0.972 | 0.928 | 0.694 |
Llama-3.1-70b-inst. | 0.752 | 0.660 | 0.691 | 0.730 | 0.920 | 0.880 | 0.630 |
KazLLM-1.0-70B | 0.748 | 0.650 | 0.806 | 0.860 | 0.790 | 0.810 | 0.570 |
GPT-4o | 0.808 | 0.776 | 0.771 | 0.880 | 0.960 | 0.890 | 0.570 |
AlemLLM | 0.848 | 0.801 | 0.858 | 0.843 | 0.959 | 0.896 | 0.729 |
QwQ 32B | 0.840 | 0.810 | 0.807 | 0.823 | 0.964 | 0.926 | 0.709 |
English Leaderboard
Model | Average | MMLU | Winogrande | Hellaswag | ARC | GSM8k | DROP |
---|---|---|---|---|---|---|---|
Yi-Lightning | 0.909 | 0.820 | 0.936 | 0.930 | 0.980 | 0.930 | 0.860 |
DeepSeek V3 37A | 0.880 | 0.840 | 0.790 | 0.900 | 0.980 | 0.950 | 0.820 |
DeepSeek R1 | 0.908 | 0.855 | 0.857 | 0.882 | 0.977 | 0.960 | 0.915 |
Llama-3.1-70b-inst. | 0.841 | 0.770 | 0.718 | 0.880 | 0.960 | 0.900 | 0.820 |
KazLLM-1.0-70B | 0.855 | 0.820 | 0.843 | 0.920 | 0.970 | 0.820 | 0.760 |
GPT-4o | 0.862 | 0.830 | 0.793 | 0.940 | 0.980 | 0.910 | 0.720 |
AlemLLM | 0.921 | 0.874 | 0.928 | 0.909 | 0.978 | 0.926 | 0.911 |
QwQ 32ะ | 0.914 | 0.864 | 0.886 | 0.897 | 0.969 | 0.969 | 0.896 |
Model specification
Architecture: Mixture of Experts
Total Parameters: 247B
Activated Parameters: 22B
Tokenizer: SentencePiece
Quantization: BF16
Vocabulary Size: 100352
Number of Layers: 56
Activation Function: SwiGLU
Positional Encoding Method: ROPE
Optimizer: AdamW
- Downloads last month
- 601