Etherll commited on
Commit
b00cd99
ยท
verified ยท
1 Parent(s): 8660715

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -16
README.md CHANGED
@@ -1,23 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- base_model: LiquidAI/LFM2-350M
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - lfm2
8
- - trl
9
- - sft
10
- license: apache-2.0
11
- language:
12
- - en
13
  ---
14
 
15
- # Uploaded model
 
 
 
 
 
 
 
 
16
 
17
- - **Developed by:** Etherll
18
- - **License:** apache-2.0
19
- - **Finetuned from model :** LiquidAI/LFM2-350M
20
 
21
  This lfm2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
22
 
23
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
1
+ ---
2
+ base_model: LiquidAI/LFM2-350M
3
+ tags:
4
+ - text-generation-inference
5
+ - transformers
6
+ - unsloth
7
+ - lfm2
8
+ - trl
9
+ - sft
10
+ - arabic
11
+ license: apache-2.0
12
+ language:
13
+ - ar
14
+ datasets:
15
+ - arbml/tashkeela
16
+ ---
17
+
18
+ # Tashkeel-350M
19
+
20
+ **Arabic Diacritization Model** | **ู†ูŽู…ููˆุฐูŽุฌูŽ ุชูŽุดู’ูƒููŠู„ู ุงู„ู†ูุตููˆุตู ุงู„ู’ุนูŽุฑูŽุจููŠูŽุฉู**
21
+
22
+ ู†ู…ูˆุฐุฌ ุจุญุฌู… 350 ู…ู„ูŠูˆู† ุจุงุฑุงู…ุชุฑ ู…ุฎุตุต ู„ุชุดูƒูŠู„ ุงู„ู†ุตูˆุต ุงู„ุนุฑุจูŠุฉ. ุชู… ุชุฏุฑูŠุจ ู‡ุฐุง ุงู„ู†ู…ูˆุฐุฌ ุจุถุจุท ู†ู…ูˆุฐุฌ
23
+
24
+ `LiquidAI/LFM2-350M`
25
+
26
+ ุนู„ู‰ ู…ุฌู…ูˆุนุฉ ุงู„ุจูŠุงู†ุงุช
27
+
28
+ `arbml/tashkeela`.
29
+
30
+ - **ุงู„ู†ู…ูˆุฐุฌ ุงู„ุฃุณุงุณูŠ:** [LiquidAI/LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M)
31
+ - **ู…ุฌู…ูˆุนุฉ ุงู„ุจูŠุงู†ุงุช:** [arbml/tashkeela](https://huggingface.co/datasets/arbml/tashkeela)
32
+
33
+ ### ูƒูŠููŠุฉ ุงู„ุงุณุชุฎุฏุงู…
34
+
35
+ ```python
36
+ from transformers import AutoModelForCausalLM, AutoTokenizer
37
+
38
+ #ุชุญู…ูŠู„ ุงู„ู†ู…ูˆุฐุฌ
39
+ model_id = "Etherll/Tashkeel-350M"
40
+ model = AutoModelForCausalLM.from_pretrained(
41
+ model_id,
42
+ device_map="auto",
43
+ torch_dtype="bfloat16",
44
+ # attn_implementation="flash_attention_2" # <- ู‚ู… ุจุฅู„ุบุงุก ุงู„ุชุนู„ูŠู‚ ู„ูˆุญุฏุฉ ู…ุนุงู„ุฌุฉ ุฑุณูˆู…ูŠุงุช ู…ุชูˆุงูู‚ุฉ
45
+ )
46
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
47
+
48
+ # ุฅุถุงูุฉ ุงู„ุชุดูƒูŠู„
49
+ prompt = "ุงู„ุณู„ุงู… ุนู„ูŠูƒู…"
50
+ input_ids = tokenizer.apply_chat_template(
51
+ [{"role": "user", "content": prompt}],
52
+ add_generation_prompt=True,
53
+ return_tensors="pt",
54
+ tokenize=True,
55
+ ).to(model.device)
56
+
57
+ output = model.generate(
58
+ input_ids,
59
+ do_sample=False,
60
+ )
61
+
62
+ print(tokenizer.decode(output[0, input_ids.shape[-1]:], skip_special_tokens=True))
63
+ ```
64
+
65
+ ### ู…ุซุงู„
66
+ * **ุงู„ู†ุต ุงู„ู…ุฏุฎู„:** `ุงู„ุณู„ุงู… ุนู„ูŠูƒู…`
67
+ * **ุงู„ู†ุงุชุฌ:** `ุงูŽู„ุณูŽู„ูŽุงู…ู ุนูŽู„ูŽูŠู’ูƒูู…ู’`
68
+
69
  ---
 
 
 
 
 
 
 
 
 
 
 
70
  ---
71
 
72
+ # Tashkeel-350M (English)
73
+
74
+ A 350M parameter model for Arabic diacritization (Tashkeel). This model is a fine-tune of `LiquidAI/LFM2-350M` on the `arbml/tashkeela` dataset.
75
+
76
+ - **Base Model:** [LiquidAI/LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M)
77
+ - **Dataset:** [arbml/tashkeela](https://huggingface.co/datasets/arbml/tashkeela)
78
+
79
+ ### How to Use
80
+ The Python code for usage is the same as listed in the Arabic section above.
81
 
82
+ ### Example
83
+ * **Input:** `ุงู„ุณู„ุงู… ุนู„ูŠูƒู…`
84
+ * **Output:** `ุงูŽู„ุณูŽู„ูŽุงู…ู ุนูŽู„ูŽูŠู’ูƒูู…ู’`
85
 
86
  This lfm2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
87
 
88
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)