pooja-ganesh commited on
Commit
4641b8f
·
verified ·
1 Parent(s): 7da07e9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -3
README.md CHANGED
@@ -1,3 +1,45 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
5
+ ---
6
+
7
+ # DeepSeek-R1-Distill-Qwen-7B-awq-asym-uint4-g128-lmhead-onnx-hybrid
8
+ - ## Introduction
9
+ This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
10
+ - ## Quantization Strategy
11
+ - ***Quantized Layers***: All linear layers
12
+ - ***Weight***: uint4 asymmetric per-group, group_size=128
13
+ - ## Quick Start
14
+ 1. [Download and install Quark](https://quark.docs.amd.com/latest/install.html)
15
+ 2. Run the quantization script in the example folder using the following command line:
16
+ ```sh
17
+ export MODEL_DIR = [local model checkpoint folder] or DeepSeek-R1-Distill-Qwen-7B
18
+ # single GPU
19
+ python quantize_quark.py --model_dir $MODEL_DIR \
20
+ --output_dir output_dir $MODEL_NAME-awq-asym-uint4-g128-lmhead \
21
+ --quant_scheme w_uint4_per_group_asym \
22
+ --num_calib_data 128 \
23
+ --quant_algo awq \
24
+ --dataset pileval_for_awq_benchmark \
25
+ --seq_len 512 \
26
+ --model_export hf_format \
27
+ --data_type bfloat16 \
28
+ --exclude_layers
29
+ # cpu
30
+ python quantize_quark.py --model_dir $MODEL_DIR \
31
+ --output_dir output_dir $MODEL_NAME-awq-asym-uint4-g128-lmhead \
32
+ --quant_scheme w_uint4_per_group_asym \
33
+ --num_calib_data 128 \
34
+ --quant_algo awq \
35
+ --dataset pileval_for_awq_benchmark \
36
+ --seq_len 512 \
37
+ --model_export hf_format \
38
+ --data_type bfloat16 \
39
+ --exclude_layers \
40
+ --device cpu
41
+ ```
42
+
43
+
44
+ #### License
45
+ Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.