Tashkeel-350M
Arabic Diacritization Model | ููู ููุฐูุฌู ุชูุดูููููู ุงูููุตููุตู ุงููุนูุฑูุจูููุฉู
ูู ูุฐุฌ ุจุญุฌู 350 ู ูููู ุจุงุฑุงู ุชุฑ ู ุฎุตุต ูุชุดููู ุงููุตูุต ุงูุนุฑุจูุฉ. ุชู ุชุฏุฑูุจ ูุฐุง ุงููู ูุฐุฌ ุจุถุจุท ูู ูุฐุฌ
LiquidAI/LFM2-350M 
ุนูู ู ุฌู ูุนุฉ ุงูุจูุงูุงุช
 arbml/tashkeela.
- ุงููู ูุฐุฌ ุงูุฃุณุงุณู: LiquidAI/LFM2-350M
- ู ุฌู ูุนุฉ ุงูุจูุงูุงุช: arbml/tashkeela
ููููุฉ ุงูุงุณุชุฎุฏุงู
from transformers import AutoModelForCausalLM, AutoTokenizer
#ุชุญู
ูู ุงููู
ูุฐุฌ
model_id = "Etherll/Tashkeel-350M"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
# ุฅุถุงูุฉ ุงูุชุดููู
prompt = "ุงูุณูุงู
 ุนูููู
" 
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)
output = model.generate(
    input_ids,
    do_sample=False,  
)
print(tokenizer.decode(output[0, input_ids.shape[-1]:], skip_special_tokens=True))
ู ุซุงู
- ุงููุต ุงูู
ุฏุฎู: ุงูุณูุงู ุนูููู
- ุงููุงุชุฌ: ุงููุณูููุงู ู ุนูููููููู ู
Tashkeel-350M (English)
A 350M parameter model for Arabic diacritization (Tashkeel). This model is a fine-tune of LiquidAI/LFM2-350M on the arbml/tashkeela dataset.
- Base Model: LiquidAI/LFM2-350M
- Dataset: arbml/tashkeela
How to Use
The Python code for usage is the same as listed in the Arabic section above.
Example
- Input: ุงูุณูุงู ุนูููู
- Output: ุงููุณูููุงู ู ุนูููููููู ู
This lfm2 model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- 20
Model tree for Etherll/Tashkeel-350M
Base model
LiquidAI/LFM2-350M