KillerShoaib commited on
Commit
6d619d0
·
verified ·
1 Parent(s): e2cba88

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -4
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  language:
3
- - en
4
  license: apache-2.0
5
  tags:
6
  - text-generation-inference
@@ -9,14 +9,110 @@ tags:
9
  - llama
10
  - trl
11
  base_model: unsloth/llama-3-8b-bnb-4bit
 
 
 
12
  ---
13
 
14
- # Uploaded model
 
 
 
 
 
 
15
 
16
  - **Developed by:** KillerShoaib
17
  - **License:** apache-2.0
18
  - **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
- This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
1
  ---
2
  language:
3
+ - bn
4
  license: apache-2.0
5
  tags:
6
  - text-generation-inference
 
9
  - llama
10
  - trl
11
  base_model: unsloth/llama-3-8b-bnb-4bit
12
+ datasets:
13
+ - iamshnoo/alpaca-cleaned-bengali
14
+ pipeline_tag: text-generation
15
  ---
16
 
17
+
18
+ # LLama-3 Bangla
19
+
20
+ <div align="center">
21
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/65ca6f0098a46a56261ac3ac/O1ATwhQt_9j59CSIylrVS.png" width="300"/>
22
+
23
+ </div>
24
 
25
  - **Developed by:** KillerShoaib
26
  - **License:** apache-2.0
27
  - **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit
28
+ - **Dataset used for finetuning :** iamshnoo/alpaca-cleaned-bengali
29
+
30
+
31
+
32
+ # Model Details
33
+
34
+ Llama 3 8 billion model was finetuned using **unsloth** package and in **4bit quantization** on a **cleaned Bangla alpaca** dataset. This is not the entire model but only **LoRA adapters**. The model is finetuned for **2 epoch** on a single T4 GPU.
35
+
36
+
37
+
38
+ # Run The Model
39
+
40
+ ## FastLanguageModel from unsloth for 2x faster inference
41
+
42
+ ```python
43
+
44
+ from unsloth import FastLanguageModel
45
+ model, tokenizer = FastLanguageModel.from_pretrained(
46
+ model_name = "KillerShoaib/llama-3-8b-bangla-4bit",
47
+ max_seq_length = 2048,
48
+ dtype = None,
49
+ load_in_4bit = True,
50
+ )
51
+ FastLanguageModel.for_inference(model)
52
+
53
+ # alpaca_prompt for the model
54
+ alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.
55
+
56
+ ### Instruction:
57
+ {}
58
+
59
+ ### Input:
60
+ {}
61
+
62
+ ### Response:
63
+ {}"""
64
+
65
+ # input with instruction and input
66
+ inputs = tokenizer(
67
+ [
68
+ alpaca_prompt.format(
69
+ "সুস্থ থাকার তিনটি উপায় বলুন", # instruction
70
+ "", # input
71
+ "", # output - leave this blank for generation!
72
+ )
73
+ ], return_tensors = "pt").to("cuda")
74
+
75
+ # generating the output and decoding it
76
+ outputs = model.generate(**inputs, max_new_tokens = 2048, use_cache = True)
77
+ tokenizer.batch_decode(outputs)
78
+ ```
79
+
80
+ ## AutoModelForPeftCausalLM from Hugginface
81
+
82
+ ```python
83
+ from peft import AutoPeftModelForCausalLM
84
+ from transformers import AutoTokenizer
85
+ load_in_4bit = True
86
+ model = AutoPeftModelForCausalLM.from_pretrained(
87
+ "KillerShoaib/llama3-8b-4bit-bangla",
88
+ load_in_4bit = True,
89
+ )
90
+ tokenizer = AutoTokenizer.from_pretrained("KillerShoaib/llama3-8b-4bit-bangla")
91
+
92
+ alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.
93
+
94
+ ### Instruction:
95
+ {}
96
+
97
+ ### Input:
98
+ {}
99
+
100
+ ### Response:
101
+ {}"""
102
+
103
+ inputs = tokenizer(
104
+ [
105
+ alpaca_prompt.format(
106
+ "সুস্থ থাকার তিনটি উপায় বলুন", # instruction
107
+ "", # input
108
+ "", # output - leave this blank for generation!
109
+ )
110
+ ], return_tensors = "pt").to("cuda")
111
+
112
+ outputs = model.generate(**inputs, max_new_tokens = 1024, use_cache = True)
113
+ tokenizer.batch_decode(outputs)
114
+ ```
115
+
116
 
117
+ **AutoModelForPeftCausalLM can be hopelessly slow, since `4bit` model downloading is not supported. Use this only if you don't have unsloth installed**
118