|
--- |
|
license: mit |
|
datasets: |
|
- Bretagne/ofis_publik_br-fr |
|
- Bretagne/OpenSubtitles_br_fr |
|
- Bretagne/Autogramm_Breton_translation |
|
language: |
|
- fr |
|
- br |
|
base_model: |
|
- facebook/m2m100_418M |
|
pipeline_tag: translation |
|
library_name: transformers |
|
--- |
|
|
|
# Kellag |
|
|
|
* A Breton -> French Translation Model called **Kellag**. |
|
* Kellag is the temporary "brother" model of [Gallek](https://huggingface.co/amurienne/gallek-m2m100), since a bidirectional fr <-> br model is not ready yet (WIP). |
|
* The current model version reached a **BLEU score of 50** after 10 epochs on a 20% split of the training set. |
|
* Only monodirectionally br->fr fine-tuned for now. |
|
* Training details available on the [GweLLM Github repository](https://github.com/blackccpie/GweLLM). |
|
|
|
Sample test code: |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline |
|
|
|
modelcard = "amurienne/kellag-m2m100" |
|
|
|
model = AutoModelForSeq2SeqLM.from_pretrained(modelcard) |
|
tokenizer = AutoTokenizer.from_pretrained(modelcard) |
|
|
|
translation_pipeline = pipeline("translation", model=model, tokenizer=tokenizer, src_lang='br', tgt_lang='fr', max_length=512, device="cpu") |
|
|
|
breton_text = "treiñ eus ar brezhoneg d'ar galleg: deskiñ a ran brezhoneg er skol." |
|
|
|
result = translation_pipeline(breton_text) |
|
print(result[0]['translation_text']) |
|
``` |
|
|
|
Demo is available on the [Gallek Space](https://huggingface.co/spaces/amurienne/Gallek) |
|
|