You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

gemma-2-2b-awq-uint4-asym-g128-lmhead-g32-fp16-onnx

  • Introduction

    This model was created by applying Quark with calibration samples from Pile dataset.
  • Quantization Strategy

    • Quantized Layers: All linear layers
    • Weight: uint4 asymmetric per-group. group_size=32 for lm_head, and group_size=128 for the rest.
  • Quick Start

  1. Download and install Quark
  2. Run the quantization script in the example folder using the following command line:
    export MODEL_DIR = [local model checkpoint folder] or google/gemma-2-2b
    # single GPU
    python quantize_quark.py --model_dir $MODEL_DIR \
                            --output_dir output_dir $MODEL_NAME-awq-uint4-asym-g128-lmhead-g32-fp16 \
                            --quant_scheme w_uint4_per_group_asym \
                            --num_calib_data 128 \
                            --quant_algo awq \
                            --dataset pileval_for_awq_benchmark \
                            --model_export hf_format \
                            --group_size 128 \
                            --group_size_per_layer lm_head 32 \
                            --data_type float32 \
                            --exclude_layers
    # cpu
    python quantize_quark.py --model_dir $MODEL_DIR \
                            --output_dir output_dir $MODEL_NAME-awq-uint4-asym-g128-lmhead-g32-fp16 \
                            --quant_scheme w_uint4_per_group_asym \
                            --num_calib_data 128 \
                            --quant_algo awq \
                            --dataset pileval_for_awq_benchmark \
                            --model_export hf_format \
                            --group_size 128 \
                            --group_size_per_layer lm_head 32 \
                            --data_type float32 \
                            --exclude_layers \
                            --device cpu
    

Deployment

Quark has its own export format, quark_safetensors, which is compatible with autoAWQ exports.

License

Modifications copyright(c) 2025 Advanced Micro Devices,Inc. All rights reserved.

Downloads last month
8
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for amd/gemma-2-2b-awq-uint4-asym-g128-lmhead-g32-fp16-onnx-hybrid

Base model

google/gemma-2-2b
Quantized
(45)
this model

Collection including amd/gemma-2-2b-awq-uint4-asym-g128-lmhead-g32-fp16-onnx-hybrid