Tashkeel-350M
Arabic Diacritization Model | ููู ููุฐูุฌู ุชูุดูููููู ุงูููุตููุตู ุงููุนูุฑูุจูููุฉู
ูู ูุฐุฌ ุจุญุฌู 350 ู ูููู ุจุงุฑุงู ุชุฑ ู ุฎุตุต ูุชุดููู ุงููุตูุต ุงูุนุฑุจูุฉ. ุชู ุชุฏุฑูุจ ูุฐุง ุงููู ูุฐุฌ ุจุถุจุท ูู ูุฐุฌ
LiquidAI/LFM2-350M
ุนูู ู ุฌู ูุนุฉ ุงูุจูุงูุงุช
arbml/tashkeela.
- ุงููู ูุฐุฌ ุงูุฃุณุงุณู: LiquidAI/LFM2-350M
- ู ุฌู ูุนุฉ ุงูุจูุงูุงุช: arbml/tashkeela
ููููุฉ ุงูุงุณุชุฎุฏุงู
from transformers import AutoModelForCausalLM, AutoTokenizer
#ุชุญู
ูู ุงููู
ูุฐุฌ
model_id = "Etherll/Tashkeel-350M"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
# ุฅุถุงูุฉ ุงูุชุดููู
prompt = "ุงูุณูุงู
ุนูููู
"
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True,
return_tensors="pt",
tokenize=True,
).to(model.device)
output = model.generate(
input_ids,
do_sample=False,
)
print(tokenizer.decode(output[0, input_ids.shape[-1]:], skip_special_tokens=True))
ู ุซุงู
- ุงููุต ุงูู
ุฏุฎู:
ุงูุณูุงู ุนูููู - ุงููุงุชุฌ:
ุงููุณูููุงู ู ุนูููููููู ู
Tashkeel-350M (English)
A 350M parameter model for Arabic diacritization (Tashkeel). This model is a fine-tune of LiquidAI/LFM2-350M on the arbml/tashkeela dataset.
- Base Model: LiquidAI/LFM2-350M
- Dataset: arbml/tashkeela
How to Use
The Python code for usage is the same as listed in the Arabic section above.
Example
- Input:
ุงูุณูุงู ุนูููู - Output:
ุงููุณูููุงู ู ุนูููููููู ู
This lfm2 model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- 20
Model tree for Etherll/Tashkeel-350M
Base model
LiquidAI/LFM2-350M