Smugri-tuned NLLB-1.3b, v0.01
This is a fine-tune of NLLB-1.3b with parallel data for 29 Finno-Ugric languages. It supports different dialect/variety generation for some of the languages, more info below.
Info on used data and other details: soon. The training of this model is in progress, quality is not tested yet. So far only parallel data was taken into training, more dialects are to come after monolingual/synthetic data is added.
Usage in Python, to translate from English to Veps (New written Veps dialect/variety):
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("tartuNLP/nllb1.3-smugri4-v0.01")
tokenizer = AutoTokenizer.from_pretrained("tartuNLP/nllb1.3-smugri4-v0.01")
input_text = "<New written Veps> This is a short example sentence."
source_lang = "eng_Latn"
target_lang = "vep_Latn"
tokenizer.src_lang = source_lang
input_tokenized = tokenizer(input_text, return_tensors="pt")
output_raw = model.generate(**input_tokenized, forced_bos_token_id=tokenizer.convert_tokens_to_ids(target_lang))
output = tokenizer.decode(output_raw[0], skip_special_tokens=True)
print(output) # should be 'Nece om lühüd ozutezsana.'
# for '<Central Eastern Veps>' the output becomes 'Nece om lühüd naverz’ sanond.'
Supported languages
est_Latn
(Estonian),fin_Latn
(Finnish),fkv_Latn
(Kven),izh_Latn
(Izhorian*),krl_Latn
(Proper Karelian*),liv_Latn
(Livonian),lud_Latn
(Ludian*),olo_Latn
(Livvi-Karelian*),vep_Latn
(Veps*),vot_Latn
(Votic*),vro_Latn
(Võro)sje_Latn
(Pite Sami),sju_Latn
(Ume Sami),sma_Latn
(Southern Sami),sme_Latn
(Northern Sami),smj_Latn
(Lule Sami),smn_Latn
(Inari Sami),sms_Latn
(Skolt Sami),sjd_Cyrl
(Kildin Sami*)kpv_Cyrl
(Komi-Zyrian),koi_Cyrl
(Komi-Permyak),udm_Cyrl
(Udmurt)mdf_Cyrl
(Moksha),myv_Cyrl
(Erzya)mhr_Cyrl
(Meadow Mari),mrj_Cyrl
(Hill Mari)hun_Latn
(Hungarian),kca_Cyrl
(Khanty*),mns_Cyrl
(Mansi)eng_Latn
(English),lvs_Latn
(Latvian),rus_Cyrl
(Russian),nor_Latn
(Norwegian)
Supported dialects
- for Izhorian:
alal
(Lower Luga),soik
(Soikkola) - for Votic:
I
,J
,Ja
,K
,Kõ
,Ke
,Ko
,L
,Li
,Lu
,M
,P
,Po
,R
,Ra
,S
,U
,V
(explanation: https://arhiiv.eki.ee/dict/vadja/lisad/v_lyhendid.pdf) - for Karelian Proper:
Dyorzha
,Ilomantsi
,Keret
,Kestenga
,Kontokki
,Korbiselga
,Maslozero
,Myandyselga
,New written Tver
,New written karelian
,Oulanga
,Padany
,Panozero
,Poduzhemye
,Porosozero
,Reboly
,Rugozero
,Suistamo
,Suoyarvi
,Tikhtozero
,Tikhvin
,Tolmachi
,Tunguda
,Uhta
,Valdai
,Vesyegonsk
,Voknavolok
,Vychetaibola
,Yushkozero
- for Ludian:
Central Ludian (Munozero)
,Mikhailovskoye
,New written Ludian
,Northern Ludian (Kondopoga)
,Southern Ludian (Svjatozero)
,Miikul
(Central Ludian) - for Livvi-Karelian:
Impilahti
,Kondushi
,Kotkozero
,Nekkula
,New written Livvic
,Rypushkalitsa
,Salmi
,Suoyarvi
,Syamozero
,Tulmozero
,Vedlozero
,Vidlitsa
- for Veps:
Central Eastern Veps
,Central Western Veps
,New written Veps
,Northern Veps
,Southern Veps
- for Kildin Sami:
orth1
- for Khanty:
kazym
(Kazym),shuryshkary
(Shuryshkar)
- Downloads last month
- 38
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support