Tashkeel-350M

Arabic Diacritization Model | ู†ูŽู…ููˆุฐูŽุฌูŽ ุชูŽุดู’ูƒููŠู„ู ุงู„ู†ูุตููˆุตู ุงู„ู’ุนูŽุฑูŽุจููŠูŽุฉู

ู†ู…ูˆุฐุฌ ุจุญุฌู… 350 ู…ู„ูŠูˆู† ุจุงุฑุงู…ุชุฑ ู…ุฎุตุต ู„ุชุดูƒูŠู„ ุงู„ู†ุตูˆุต ุงู„ุนุฑุจูŠุฉ. ุชู… ุชุฏุฑูŠุจ ู‡ุฐุง ุงู„ู†ู…ูˆุฐุฌ ุจุถุจุท ู†ู…ูˆุฐุฌ

LiquidAI/LFM2-350M

ุนู„ู‰ ู…ุฌู…ูˆุนุฉ ุงู„ุจูŠุงู†ุงุช

arbml/tashkeela.

  • ุงู„ู†ู…ูˆุฐุฌ ุงู„ุฃุณุงุณูŠ: LiquidAI/LFM2-350M
  • ู…ุฌู…ูˆุนุฉ ุงู„ุจูŠุงู†ุงุช: arbml/tashkeela

ูƒูŠููŠุฉ ุงู„ุงุณุชุฎุฏุงู…

from transformers import AutoModelForCausalLM, AutoTokenizer

#ุชุญู…ูŠู„ ุงู„ู†ู…ูˆุฐุฌ
model_id = "Etherll/Tashkeel-350M"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# ุฅุถุงูุฉ ุงู„ุชุดูƒูŠู„
prompt = "ุงู„ุณู„ุงู… ุนู„ูŠูƒู…" 
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)

output = model.generate(
    input_ids,
    do_sample=False,  
)

print(tokenizer.decode(output[0, input_ids.shape[-1]:], skip_special_tokens=True))

ู…ุซุงู„

  • ุงู„ู†ุต ุงู„ู…ุฏุฎู„: ุงู„ุณู„ุงู… ุนู„ูŠูƒู…
  • ุงู„ู†ุงุชุฌ: ุงูŽู„ุณูŽู„ูŽุงู…ู ุนูŽู„ูŽูŠู’ูƒูู…ู’


Tashkeel-350M (English)

A 350M parameter model for Arabic diacritization (Tashkeel). This model is a fine-tune of LiquidAI/LFM2-350M on the arbml/tashkeela dataset.

How to Use

The Python code for usage is the same as listed in the Arabic section above.

Example

  • Input: ุงู„ุณู„ุงู… ุนู„ูŠูƒู…
  • Output: ุงูŽู„ุณูŽู„ูŽุงู…ู ุนูŽู„ูŽูŠู’ูƒูู…ู’

This lfm2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
20
Safetensors
Model size
0.4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Etherll/Tashkeel-350M

Base model

LiquidAI/LFM2-350M
Finetuned
(28)
this model

Dataset used to train Etherll/Tashkeel-350M