Model details
This machine translation model can convert single sentences from and to any combination of the following languages:
ISO 693-3 | Language name |
---|---|
eng | English |
ach | Acholi |
lgg | Lugbara |
lug | Luganda |
nyn | Runyankole |
teo | Ateso |
It was trained on the SALT dataset and a variety of additional external data resources, including back-translated news articles, FLORES-200, MT560 and LAFAND-MT. The base model was facebok/nllb-200-1.3B, with tokens adapted to add support for languages not originally included.
Usage example
tokenizer = transformers.NllbTokenizer.from_pretrained(
'Sunbird/translate-nllb-1.3b-salt')
model = transformers.M2M100ForConditionalGeneration.from_pretrained(
'Sunbird/translate-nllb-1.3b-salt')
text = 'Where is the hospital?'
source_language = 'eng'
target_language = 'lug'
language_tokens = {
'eng': 256047,
'ach': 256111,
'lgg': 256008,
'lug': 256110,
'nyn': 256002,
'teo': 256006,
}
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
inputs = tokenizer(text, return_tensors="pt").to(device)
inputs['input_ids'][0][0] = language_tokens[source_language]
translated_tokens = model.to(device).generate(
**inputs,
forced_bos_token_id=language_tokens[target_language],
max_length=100,
num_beams=5,
)
result = tokenizer.batch_decode(
translated_tokens, skip_special_tokens=True)[0]
# Eddwaliro liri ludda wa?
Evaluation metrics
Results on salt-dev:
Source language | Target language | BLEU |
---|---|---|
ach | eng | 28.371 |
lgg | eng | 30.45 |
lug | eng | 41.978 |
nyn | eng | 32.296 |
teo | eng | 30.422 |
eng | ach | 20.972 |
eng | lgg | 22.362 |
eng | lug | 30.359 |
eng | nyn | 15.305 |
eng | teo | 21.391 |
- Downloads last month
- 2,776
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Sunbird/translate-nllb-1.3b-salt
Base model
facebook/nllb-200-1.3B