---
library_name: transformers
tags:
- gemma2
- instruct
- mamaylm
- insait
license: gemma
language:
- uk
- en
base_model:
- google/gemma-2-9b-it
- google/gemma-2-9b
pipeline_tag: text-generation
---
# INSAIT-Institute/MamayLM-Gemma-2-9B-IT-v0.1

![image/png](https://cdn-uploads.huggingface.co/production/uploads/637e1f8cf7e01589cc17bf7e/p6d0YFHjWCQ3S12jWqO1m.png)

INSAIT introduces **MamayLM-Gemma-2-9B-IT-v0.1**, the best performing Ukrainian language model based on **google/gemma-2-9b** and **google/gemma-2-9b-it**.
MamayLM-Gemma-2-9B-IT-v0.1 is **free to use** and distributed under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
This model was created by [`INSAIT`](https://insait.ai/), part of Sofia University St. Kliment Ohridski, in Sofia, Bulgaria.

# Model description

The model was built on top of Google’s Gemma 2 9B open models.
It was continuously pre-trained on a large pre-filtered dataset (75B tokens of Ukrainian and English data in total) using the combination of data mixing and model merging, 
allowing the model to gain outstanding Ukrainian cultural and linguistic capabilities while retaining its English performance. 
During the pre-training stage, we use various datasets, including Ukrainian web crawl data (FineWeb2), freely available datasets such as Wikipedia, a range of specialized Ukrainian datasets, and machine translations of popular English datasets.
The model was then instruction-fine-tuned on a newly constructed Ukrainian instruction dataset created using machine translations of current best English datasets and specialized Ukrainian datasets, prepared by Ukrainian community.
For more information check our blogpost ([English](https://huggingface.co/blog/INSAIT-Institute/mamaylm), [Ukrainian](https://huggingface.co/blog/INSAIT-Institute/mamaylm-ukr)).

# Benchmarks and Results

![image/png](https://cdn-uploads.huggingface.co/production/uploads/650ed7adf141bc34f91a12ae/UTzlvHkFn3K0Lg717vBg6.png)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/650ed7adf141bc34f91a12ae/_pXGPAMx1a-mOnujQ03iW.png)

We evaluate our models on a set of standard English benchmarks, a translated version of them in Ukrainian, as well as, Ukrainian specific benchmarks we collected:

- **Winogrande challenge**: testing world knowledge and understanding
- **Hellaswag**: testing sentence completion
- **ARC Easy/Challenge**: testing logical reasoning
- **TriviaQA**: testing trivia knowledge
- **GSM-8k**: solving multiple-choice questions in high-school mathematics
- **MMLU**: testing knowledge on a multitude of topics
- **IFEval**: testing instruction-following skills
- **ZNO**: testing knowledge of the Ukrainian high school curriculum in Ukrainian language, literature, mathematics and geography


These benchmarks test logical reasoning, mathematics, knowledge, language understanding and other skills of the models and are provided at https://github.com/insait-institute/lm-evaluation-harness-uk.
The graphs above show the performance of MamayLM 9B compared to other large open models. The results show the excellent abilities of MamayLM in Ukrainian, which allow them to **outperform much larger models**,
including Alibaba’s Qwen 2.5 72B and Meta’s Llama3.1 70B. 
Finally, our models retain the **excellent English performance** inherited from the original Google Gemma 2 models upon which they are based.

![image/png](https://cdn-uploads.huggingface.co/production/uploads/650ed7adf141bc34f91a12ae/6gCcqut_S-sIVgptjdmKr.png)

# Use in 🤗 Transformers
First install the latest version of the transformers library:
```
pip install -U 'transformers[torch]'
```
Then load the model in transformers:
```python
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
    "INSAIT-Institute/MamayLM-Gemma-2-9B-IT-v0.1",
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    device_map="auto",
)
```

# Recommended Parameters

For optimal performance, we recommend the following parameters for text generation, as we have extensively tested our model with them:

```python
from transformers import GenerationConfig
generation_params = GenerationConfig(
    max_new_tokens=2048,              # Choose maximum generation tokens
    temperature=0.1,
    top_k=25,
    top_p=1,
    repetition_penalty=1.1,
    eos_token_id=[1,107],
    do_sample=True
)
```

In principle, increasing temperature should work adequately as well.

# Instruction format

In order to leverage instruction fine-tuning, your prompt should begin with a beginning-of-sequence token `<bos>` and be formatted in the Gemma 2 chat template. `<bos>` should only be the first token in a chat sequence.

E.g.
```
<bos><start_of_turn>user
Хто такий Козак Мамай?<end_of_turn>
<start_of_turn>model
 
```

This format is also available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method:

```python
tokenizer = AutoTokenizer.from_pretrained(
    "INSAIT-Institute/MamayLM-Gemma-2-9B-IT-v0.1",
    use_default_system_prompt=False,
)
messages = [
    {"role": "user", "content": "Хто такий Козак Мамай?"},
]
input_ids = tokenizer.apply_chat_template(
  messages,
  return_tensors="pt",
  add_generation_prompt=True,
  return_dict=True
)
outputs = model.generate(
  **input_ids,
  generation_config=generation_params
)
print(tokenizer.decode(outputs[0]))
```

# Use with vLLM

Example usage with vLLM:

```python
from vllm import LLM, SamplingParams
from vllm.inputs import TokensPrompt
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "INSAIT-Institute/MamayLM-Gemma-2-9B-IT-v0.1",
    use_default_system_prompt=False,
)
sampling_params = SamplingParams(
    max_tokens=2048,
    temperature=0.1,
    top_k=25,
    top_p=1,
    repetition_penalty=1.1,
    stop_token_ids=[1, 107],
)
llm = LLM(
    model="INSAIT-Institute/MamayLM-Gemma-2-9B-IT-v0.1",
    dtype="bfloat16",
    enforce_eager=True
)
messages = [
    {"role": "user", "content": "Хто такий Козак Мамай?"},
]
formatted_prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
input_ids = tokenizer(
    formatted_prompt,
    add_special_tokens=False
).input_ids
prompt = TokensPrompt(prompt_token_ids=input_ids)
output = llm.generate(
    prompt,
    sampling_params
)
generated_text = output[0].outputs[0].text
print(generated_text)
```

# Use with GGML / llama.cpp

The model and instructions for usage in GGUF format are available at [INSAIT-Institute/MamayLM-Gemma-2-9B-IT-v0.1-GGUF](https://huggingface.co/INSAIT-Institute/MamayLM-Gemma-2-9B-IT-v0.1-GGUF).

# Community Feedback

We welcome feedback from the community to help improve MamayLM. If you have suggestions, encounter any issues, or have ideas for improvements, please:
- Share your experience using the model through Hugging Face's community discussion feature or
- Contact us at [contact@insait.ai](mailto:contact@insait.ai)

Your real-world usage and insights are valuable in helping us optimize the model's performance and behaviour for various use cases.

# Summary
- **Finetuned from:** [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it); [google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b);
- **Model type:** Causal decoder-only transformer language model
- **Language:** Ukrainian and English
- **Contact:** [contact@insait.ai](mailto:contact@insait.ai)
- **License:** MamayLM is distributed under [Gemma Terms of Use](https://huggingface.co/INSAIT-Institute/MamayLM-Gemma-2-9B-IT-v0.1/raw/main/LICENSE)