Tashkeel-700M
Arabic Diacritization Model | ููู ููุฐูุฌู ุชูุดูููููู ุงููููุตููุตู ุงููุนูุฑูุจููููุฉู
ูู ูุฐุฌ ุจุญุฌู 700 ู ูููู ุจุงุฑุงู ุชุฑ ู ุฎุตุต ูุชุดููู ุงููุตูุต ุงูุนุฑุจูุฉ. ุชู ุชุฏุฑูุจ ูุฐุง ุงููู ูุฐุฌ ุจุถุจุท ูู ูุฐุฌ
LiquidAI/LFM2-700M
ุนูู ู ุฌู ูุนุฉ ุงูุจูุงูุงุช
arbml/tashkeela
.
- ุงููู ูุฐุฌ ุงูุฃุณุงุณู: LiquidAI/LFM2-700M
- ู ุฌู ูุนุฉ ุงูุจูุงูุงุช: arbml/tashkeela
ููููุฉ ุงูุงุณุชุฎุฏุงู
from transformers import AutoModelForCausalLM, AutoTokenizer
#ุชุญู
ูู ุงููู
ูุฐุฌ
model_id = "Etherll/Tashkeel-700M"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
# ุฅุถุงูุฉ ุงูุชุดููู
prompt = "ุงูุณูุงู
ุนูููู
"
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True,
return_tensors="pt",
tokenize=True,
).to(model.device)
output = model.generate(
input_ids,
do_sample=False,
)
print(tokenizer.decode(output[0, input_ids.shape[-1]:], skip_special_tokens=True))
ู ุซุงู
- ุงููุต ุงูู
ุฏุฎู:
ุงูุณูุงู ุนูููู
- ุงููุงุชุฌ:
ุงูุณููููุงู ู ุนูููููููู ู
Tashkeel-700M (English)
A 700M parameter model for Arabic diacritization (Tashkeel). This model is a fine-tune of LiquidAI/LFM2-700M
on the arbml/tashkeela
dataset.
- Base Model: LiquidAI/LFM2-700M
- Dataset: arbml/tashkeela
How to Use
The Python code for usage is the same as listed in the Arabic section above.
Example
- Input:
ุงูุณูุงู ุนูููู
- Output:
ุงูุณููููุงู ู ุนูููููููู ู
This lfm2 model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for Etherll/Tashkeel-700M
Base model
LiquidAI/LFM2-700M