1.4_new_models
Collection
12 items
•
Updated
export MODEL_DIR = [local model checkpoint folder] or google/gemma-2-2b
# single GPU
python quantize_quark.py --model_dir $MODEL_DIR \
--output_dir output_dir $MODEL_NAME-awq-uint4-asym-g128-lmhead-g32-fp16 \
--quant_scheme w_uint4_per_group_asym \
--num_calib_data 128 \
--quant_algo awq \
--dataset pileval_for_awq_benchmark \
--model_export hf_format \
--group_size 128 \
--group_size_per_layer lm_head 32 \
--data_type float32 \
--exclude_layers
# cpu
python quantize_quark.py --model_dir $MODEL_DIR \
--output_dir output_dir $MODEL_NAME-awq-uint4-asym-g128-lmhead-g32-fp16 \
--quant_scheme w_uint4_per_group_asym \
--num_calib_data 128 \
--quant_algo awq \
--dataset pileval_for_awq_benchmark \
--model_export hf_format \
--group_size 128 \
--group_size_per_layer lm_head 32 \
--data_type float32 \
--exclude_layers \
--device cpu
Quark has its own export format, quark_safetensors, which is compatible with autoAWQ exports.
Modifications copyright(c) 2025 Advanced Micro Devices,Inc. All rights reserved.
Base model
google/gemma-2-2b