Meltemi 7B Instruct Quantized models

Description

In this repository you can find quantised GGUF variants of Meltemi-7B-Instruct-v1 model, created using llama.cpp at the Institute for Language and Speech Processing of Athena Research & Innovation Center.

Provided files (Use case column taken from the llama.cpp documentation)

Based on the information

Name	Quant method	Bits	Size	Appr. RAM required	Use case
meltemi-instruct-v1_q3_K_M.bin	Q3_K_M	3	3.67 GB	6.45 GB	small, high quality loss
meltemi-instruct-v1_q5_K_M.bin	Q5_K_M	5	5.31 GB	8.1 GB	large, low quality loss - recommended

Instruction format

The prompt format is the same as the Zephyr format:

<s><|system|>
Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη.</s>
<|user|>
Πες μου αν έχεις συνείδηση.</s>
<|assistant|>

Loading the model with llama_cpp

Install llama-cpp-python (set -DLLAMA_CUBLAS=on if you want to use your GPU for inference)

$env:CMAKE_ARGS="-DLLAMA_CUBLAS=on"
pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="./meltemi-instruct-v1_q5_K_M.bin",  # Download the model file first
    n_ctx=8192,  # The max sequence length to use - note that longer sequence lengths require much more resources
    n_threads=8,  # The number of CPU threads to use, tailor to your system and the resulting performance
    n_gpu_layers=35  # The number of layers to offload to GPU, if you have GPU acceleration available
)
system = "Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη."
input_text = "Πες μου αν έχεις συνείδηση."

prompt = f"""
        <|system|>
        {system}
        </s>
        <|user|>
        {input_text}
        </s>
        <|assistant|>
        """

output = llm(
    prompt,
    max_tokens=1024,
    stop=["</s>"],
    echo=True
)

output_text = output['choices'][0]['text'][len(prompt):].strip()

Ethical Considerations

This model has not been aligned with human preferences, and therefore might generate misleading, harmful, or toxic content.

Acknowledgements

The ILSP team utilized Amazon’s cloud computing services, which were made available via GRNET under the OCRE Cloud framework, providing Amazon Web Services for the Greek Academic and Research Community.

ilsp
/

Meltemi-7B-Instruct-v1-GGUF