Update README.md

cea07d0 verified 9 months ago

4.69 kB

	---
	license: llama3.1
	language:
	- el
	- en
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- text-generation-inference
	base_model:
	- ilsp/Llama-Krikri-8B-Base
	---

	# Llama-Krikri-8B-Instruct: An Instruction-tuned Large Language Model for the Greek language

	Following the release of [Meltemi-7B](https://huggingface.co/ilsp/Meltemi-7B-v1) on the 26th March 2024, we are happy to welcome Krikri to the family of ILSP open Greek LLMs.
	Krikri is built on top of [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B), extending its capabilities for Greek through continual pretraining on a large corpus of high-quality and locally relevant Greek texts. We present Llama-Krikri-8B-Instruct, along with the base model, [Llama-Krikri-8B-Base](https://huggingface.co/ilsp/Llama-Krikri-8B-Base).

	![image/png](llama-krikri-image.jpg)


	# Model Information

	- Vocabulary extension of the Llama-3.1 tokenizer with Greek tokens
	- 128k context length (approximately 80,000 Greek words)
	- We extend the pretraining of Llama-3.1-8B with added proficiency for the Greek language, by utilizing a large training corpus.
	* This corpus includes 56.7 billion monolingual Greek tokens, constructed from publicly available resources.
	* Additionaly, to mitigate catastrophic forgetting and ensure that the model has bilingual capabilities, we use additional sub-corpora with monolingual English texts (21 billion tokens) and Greek-English parallel data (5.5 billion tokens).
	* The training corpus also contains 7.8 billion math and code tokens.
	* This corpus has been processed, filtered, and deduplicated to ensure data quality and is outlined below:


	\| Sub-corpus \| # Tokens \| Percentage \|
	\|-----------\|------------------\|------------\|
	\| Greek \| 56.7 B \| 62.3 % \|
	\| English \| 21.0 B \| 23.1 % \|
	\| Parallel \| 5.5 B \| 6.0 % \|
	\| Math/Code \| 7.8 B \| 8.6 % \|
	\| Total \| 91 B \| 100% \|


	Chosen subsets of the 91 billion corpus were upsampled resulting in a size of 110 billion tokens.

	🚨 More information of the post-training corpus and methdology coming soon. 🚨


	# How to use


	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	device = "cuda"

	model = AutoModelForCausalLM.from_pretrained("ilsp/Llama-Krikri-8B-Instruct")
	tokenizer = AutoTokenizer.from_pretrained("ilsp/Llama-Krikri-8B-Instruct")

	model.to(device)

	system_prompt = "Είσαι το Κρικρί, ένα εξαιρετικά ανεπτυγμένο μοντέλο Τεχνητής Νοημοσύνης για τα ελληνικα και εκπαιδεύτηκες από το ΙΕΛ του Ε.Κ. \"Αθηνά\"."
	user_prompt = "Σε τι διαφέρει ένα κρικρί από ένα λάμα;"

	messages = [
	{"role": "system", "content": system_prompt},
	{"role": "user", "content": user_prompt},
	]
	prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
	input_prompt = tokenizer(prompt, return_tensors='pt').to(device)
	outputs = model.generate(input_prompt['input_ids'], max_new_tokens=256, do_sample=True)

	print(tokenizer.batch_decode(outputs)[0])
	```

	# How to serve with OpenAI compatible server via vLLM

	```bash
	vllm serve ilsp/Llama-Krikri-8B-Instruct \
	--enforce-eager \
	--dtype 'bfloat16' \
	--api-key token-abc123
	```

	Then, the model can be used through Python using:
	```python
	from openai import OpenAI

	api_key = "token-abc123"
	base_url = "http://localhost:8000/v1"

	client = OpenAI(
	api_key=api_key,
	base_url=base_url,
	)

	system_prompt = "Είσαι ένα ανεπτυγμένο μεταφραστικό σύστημα που απαντάει με λίστες Python."
	user_prompt = "Δώσε μου την παρακάτω λίστα με μεταφρασμένο κάθε string της στα ελληνικά: ['Ethics of duty', 'Postmodern ethics', 'Consequentialist ethics', 'Utilitarian ethics', 'Deontological ethics', 'Virtue ethics', 'Relativist ethics']"

	messages = [
	{"role": "system", "content": system_prompt},
	{"role": "user", "content": user_prompt},
	]

	response = client.chat.completions.create(model="ilsp/Llama-Krikri-8B-Instruct",
	messages=messages)
	print(response.choices[0].message.content)
	```

	# Evaluation

	🚨 Instruction following and chat capability evaluation benchmarks coming soon. 🚨

	# Acknowledgements

	The ILSP team utilized Amazon's cloud computing services, which were made available via GRNET under the [OCRE Cloud framework](https://www.ocre-project.eu/), providing Amazon Web Services for the Greek Academic and Research Community.