MohamedRashad commited on
Commit
ad41ece
·
1 Parent(s): 850ba9e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +190 -0
README.md ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: FreedomIntelligence/AceGPT-13B-chat
3
+ inference: false
4
+ license: llama2
5
+ model_creator: FreedomIntelligence
6
+ model_name: AceGPT 13B chat
7
+ model_type: llama2
8
+ quantized_by: MohamedRashad
9
+ datasets:
10
+ - FreedomIntelligence/Arabic-Vicuna-80
11
+ - FreedomIntelligence/Arabic-AlpacaEval
12
+ - FreedomIntelligence/MMLU_Arabic
13
+ - FreedomIntelligence/EXAMs
14
+ - FreedomIntelligence/ACVA-Arabic-Cultural-Value-Alignment
15
+ language:
16
+ - en
17
+ - ar
18
+ library_name: transformers
19
+ ---
20
+
21
+
22
+ # AceGPT 13B Chat - AWQ
23
+ - Model creator: [FreedomIntelligence](https://huggingface.co/FreedomIntelligence)
24
+ - Original model: [AceGPT 13B Chat](https://huggingface.co/FreedomIntelligence/AceGPT-13B-chat)
25
+
26
+ <!-- description start -->
27
+ ## Description
28
+
29
+ This repo contains AWQ model files for [FreedomIntelligence's AceGPT 13B Chat(https://huggingface.co/FreedomIntelligence/AceGPT-13B-chat).
30
+
31
+ ### About AWQ
32
+
33
+ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
34
+
35
+ It is supported by:
36
+
37
+ - [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
38
+ - [vLLM](https://github.com/vllm-project/vllm) - Llama and Mistral models only
39
+ - [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
40
+ - [Transformers](https://huggingface.co/docs/transformers) version 4.35.0 and later, from any code or client that supports Transformers
41
+ - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code
42
+
43
+ <!-- description end -->
44
+
45
+ <!-- prompt-template start -->
46
+ ## Prompt template: Unknown
47
+
48
+ ```
49
+ [INST] <<SYS>>\nأنت مساعد مفيد ومحترم وصادق. أجب دائما بأكبر قدر ممكن من المساعدة بينما تكون آمنا. يجب ألا تتضمن إجاباتك أي محتوى ضار أو غير أخلاقي أو عنصري أو جنسي أو سام أو خطير أو غير قانوني. يرجى التأكد من أن ردودك غير متحيزة اجتماعيا وإيجابية بطبيعتها.\n\nإذا كان السؤال لا معنى له أو لم يكن متماسكا من الناحية الواقعية، اشرح السبب بدلا من الإجابة على شيء غير صحيح. إذا كنت لا تعرف إجابة سؤال ما، فيرجى عدم مشاركة معلومات خاطئة.\n<</SYS>>\n\n
50
+ [INST] {prompt} [/INST]
51
+ ```
52
+ <!-- prompt-template end -->
53
+
54
+ <!-- README_AWQ.md-use-from-python start -->
55
+ ## Inference from Python code using Transformers
56
+
57
+ ### Install the necessary packages
58
+
59
+ - Requires: [Transformers](https://huggingface.co/docs/transformers) 4.35.0 or later.
60
+ - Requires: [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) 0.1.6 or later.
61
+
62
+ ```shell
63
+ pip3 install --upgrade "autoawq>=0.1.6" "transformers>=4.35.0"
64
+ ```
65
+
66
+ Note that if you are using PyTorch 2.0.1, the above AutoAWQ command will automatically upgrade you to PyTorch 2.1.0.
67
+
68
+ If you are using CUDA 11.8 and wish to continue using PyTorch 2.0.1, instead run this command:
69
+
70
+ ```shell
71
+ pip3 install https://github.com/casper-hansen/AutoAWQ/releases/download/v0.1.6/autoawq-0.1.6+cu118-cp310-cp310-linux_x86_64.whl
72
+ ```
73
+
74
+ If you have problems installing [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) using the pre-built wheels, install it from source instead:
75
+
76
+ ```shell
77
+ pip3 uninstall -y autoawq
78
+ git clone https://github.com/casper-hansen/AutoAWQ
79
+ cd AutoAWQ
80
+ pip3 install .
81
+ ```
82
+
83
+ ### Transformers example code (requires Transformers 4.35.0 and later)
84
+
85
+ ```python
86
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
87
+
88
+ model_name_or_path = "MohamedRashad/AceGPT-13B-chat-AWQ"
89
+
90
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
91
+ model = AutoModelForCausalLM.from_pretrained(
92
+ model_name_or_path,
93
+ low_cpu_mem_usage=True,
94
+ device_map="auto"
95
+ )
96
+
97
+ # Using the text streamer to stream output one token at a time
98
+ streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
99
+
100
+ prompt = "ما أجمل بيت شعر فى اللغة العربية ؟"
101
+ prompt_template=f'''[INST] <<SYS>>\nأنت مساعد مفيد ومحترم وصادق. أجب دائما بأكبر قدر ممكن من المساعدة بينما تكون آمنا. يجب ألا تتضمن إجاباتك أي محتوى ضار أو غير أخلاقي أو عنصري أو جنسي أو سام أو خطير أو غير قانوني. يرجى التأكد من أن ردودك غير متحيزة اجتماعيا وإيجابية بطبيعتها.\n\nإذا كان السؤال لا معنى له أو لم يكن متماسكا من الناحية الواقعية، اشرح السبب بدلا من الإجابة على شيء غير صحيح. إذا كنت لا تعرف إجابة سؤال ما، فيرجى عدم مشاركة معلومات خاطئة.\n<</SYS>>\n\n
102
+ [INST] {prompt} [/INST]
103
+ '''
104
+
105
+ # Convert prompt to tokens
106
+ tokens = tokenizer(
107
+ prompt_template,
108
+ return_tensors='pt'
109
+ ).input_ids.cuda()
110
+
111
+ generation_params = {
112
+ "do_sample": True,
113
+ "temperature": 0.7,
114
+ "top_p": 0.95,
115
+ "top_k": 40,
116
+ "max_new_tokens": 512,
117
+ "repetition_penalty": 1.1
118
+ }
119
+
120
+ # Generate streamed output, visible one token at a time
121
+ generation_output = model.generate(
122
+ tokens,
123
+ streamer=streamer,
124
+ **generation_params
125
+ )
126
+
127
+ # Generation without a streamer, which will include the prompt in the output
128
+ generation_output = model.generate(
129
+ tokens,
130
+ **generation_params
131
+ )
132
+
133
+ # Get the tokens from the output, decode them, print them
134
+ token_output = generation_output[0]
135
+ text_output = tokenizer.decode(token_output)
136
+ print("model.generate output: ", text_output)
137
+
138
+ # Inference is also possible via Transformers' pipeline
139
+ from transformers import pipeline
140
+
141
+ pipe = pipeline(
142
+ "text-generation",
143
+ model=model,
144
+ tokenizer=tokenizer,
145
+ **generation_params
146
+ )
147
+
148
+ pipe_output = pipe(prompt_template)[0]['generated_text']
149
+ print("pipeline output: ", pipe_output)
150
+
151
+ ```
152
+ <!-- README_AWQ.md-use-from-python end -->
153
+
154
+
155
+ <!-- README_AWQ.md-provided-files start -->
156
+ ## How AWQ Quantization happened ?
157
+ ```python
158
+ from awq import AutoAWQForCausalLM
159
+ from transformers import AutoTokenizer, AutoModelForCausalLM
160
+
161
+ model_path = "FreedomIntelligence/AceGPT-13B-chat"
162
+ quant_path = "AceGPT-13B-chat-AWQ"
163
+ quant_config = {"zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM"}
164
+ load_config = {
165
+ "low_cpu_mem_usage": True,
166
+ "device_map": "auto",
167
+ "trust_remote_code": True,
168
+ }
169
+ # Load model
170
+ model = AutoAWQForCausalLM.from_pretrained(model_path, **load_config)
171
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
172
+
173
+ # Quantize
174
+ model.quantize(tokenizer, quant_config=quant_config)
175
+
176
+ # Save quantized model
177
+ model.save_quantized(quant_path)
178
+ tokenizer.save_pretrained(quant_path)
179
+
180
+ # Load quantized model
181
+ model = AutoModelForCausalLM.from_pretrained(quant_path)
182
+ tokenizer = AutoTokenizer.from_pretrained(quant_path)
183
+
184
+ # Push to hub
185
+ model.push_to_hub(quant_path)
186
+ tokenizer.push_to_hub(quant_path)
187
+ ```
188
+
189
+ <!-- README_AWQ.md-provided-files end -->
190
+