Quantization for Ryzen AI IPU

Please refer to the guide How to apply quantization to understand how to use the following classes to quantize models targeting Ryzen AI IPU.

Using Vitis AI Quantizer

RyzenAIOnnxQuantizer

class optimum.amd.ryzenai.RyzenAIOnnxQuantizer

< source >

( onnx_model_path: Path config: Optional = None )

Handles the RyzenAI quantization process for models shared on huggingface.co/models.

from_pretrained

< source >

( model_or_path: Union file_name: Optional = None )

Parameters

model_or_path (Union[str, Path]) — Can be either:
- A path to a saved exported ONNX Intermediate Representation (IR) model, e.g., `./my_model_directory/.
file_name(Optional[str], defaults to None) — Overwrites the default model file name from "model.onnx" to file_name. This allows you to load different model files from the same repository or directory.

Instantiates a RyzenAIOnnxQuantizer from an ONNX model file.

get_calibration_dataset

< source >

( dataset_name: str num_samples: int = 100 dataset_config_name: Optional = None dataset_split: Optional = None preprocess_function: Optional = None preprocess_batch: bool = True seed: Optional = 2016 token: bool = None streaming: bool = False )

Parameters

dataset_name (str) — The dataset repository name on the Hugging Face Hub or path to a local directory containing data files to load to use for the calibration step.
num_samples (int, defaults to 100) — The maximum number of samples composing the calibration dataset.
dataset_config_name (Optional[str], defaults to None) — The name of the dataset configuration.
dataset_split (Optional[str], defaults to None) — Which split of the dataset to use to perform the calibration step.
preprocess_function (Optional[Callable], defaults to None) — Processing function to apply to each example after loading dataset.
preprocess_batch (bool, defaults to True) — Whether the preprocess_function should be batched.
seed (int, defaults to 2016) — The random seed to use when shuffling the calibration dataset.
token (bool, defaults to False) — Whether to use the token generated when running transformers-cli login (necessary for some datasets like ImageNet).

Creates the calibration datasets.Dataset to use for the post-training static quantization calibration step.

quantize

< source >

( quantization_config: QuantizationConfig dataset: Dataset save_dir: Union batch_size: int = 1 file_suffix: Optional = 'quantized' )

Parameters

quantization_config (QuantizationConfig) — The configuration containing the parameters related to quantization.
save_dir (Union[str, Path]) — The directory where the quantized model should be saved.
file_suffix (Optional[str], defaults to "quantized") — The file_suffix used to save the quantized model.
calibration_tensors_range (Optional[Dict[str, Tuple[float, float]]], defaults to None) — The dictionary mapping the nodes name to their quantization ranges, used and required only when applying static quantization.

Quantizes a model given the optimization specifications defined in quantization_config.

QuantizationConfig

class optimum.amd.ryzenai.QuantizationConfig

< source >

( format: QuantFormat = <QuantFormat.QDQ: 1> calibration_method: CalibrationMethod = <PowerOfTwoMethod.MinMSE: 1> activations_dtype: QuantType = <QuantType.QUInt8: 1> activations_symmetric: bool = True weights_dtype: QuantType = <QuantType.QInt8: 0> weights_symmetric: bool = True enable_dpu: bool = True )

Parameters

is_static (bool) — Whether to apply static quantization or dynamic quantization.
format (QuantFormat) — Targeted RyzenAI quantization representation format. For the Operator Oriented (QOperator) format, all the quantized operators have their own ONNX definitions. For the Tensor Oriented (QDQ) format, the model is quantized by inserting QuantizeLinear / DeQuantizeLinear operators.
calibration_method (CalibrationMethod) — The method chosen to calculate the activations quantization parameters using the calibration dataset.
activations_dtype (QuantType, defaults to QuantType.QUInt8) — The quantization data types to use for the activations.
activations_symmetric (bool, defaults to False) — Whether to apply symmetric quantization on the activations.
weights_dtype (QuantType, defaults to QuantType.QInt8) — The quantization data types to use for the weights.
weights_symmetric (bool, defaults to True) — Whether to apply symmetric quantization on the weights.
enable_dpu (bool, defaults to True) — Determines whether to generate a quantized model that is suitable for the DPU. If set to True, the quantization process will create a model that is optimized for DPU computations.

QuantizationConfig is the configuration class handling all the RyzenAI quantization parameters.

< > Update on GitHub

Optimum

Quantization for Ryzen AI IPU

Using Vitis AI Quantizer

RyzenAIOnnxQuantizer

class optimum.amd.ryzenai.RyzenAIOnnxQuantizer

from_pretrained

get_calibration_dataset

quantize

QuantizationConfig

class optimum.amd.ryzenai.QuantizationConfig