Quantization
ORTQuantizer
class optimum.onnxruntime.ORTQuantizer
< source >( onnx_model_path: Path config: typing.Optional[ForwardRef('PretrainedConfig')] = None )
Handles the ONNX Runtime quantization process for models shared on huggingface.co/models.
Computes the quantization ranges.
fit
< source >( dataset: Dataset calibration_config: CalibrationConfig onnx_augmented_model_name: typing.Union[str, pathlib.Path] = 'augmented_model.onnx' operators_to_quantize: typing.Optional[typing.List[str]] = None batch_size: int = 1 use_external_data_format: bool = False use_gpu: bool = False force_symmetric_range: bool = False )
Parameters
- dataset (
Dataset
) — The dataset to use when performing the calibration step. - calibration_config (
~CalibrationConfig
) — The configuration containing the parameters related to the calibration step. - onnx_augmented_model_name (
Union[str, Path]
, defaults to"augmented_model.onnx"
) — The path used to save the augmented model used to collect the quantization ranges. - operators_to_quantize (
Optional[List[str]]
, defaults toNone
) — List of the operators types to quantize. - batch_size (
int
, defaults to 1) — The batch size to use when collecting the quantization ranges values. - use_external_data_format (
bool
, defaults toFalse
) — Whether to use external data format to store model which size is >= 2Gb. - use_gpu (
bool
, defaults toFalse
) — Whether to use the GPU when collecting the quantization ranges values. - force_symmetric_range (
bool
, defaults toFalse
) — Whether to make the quantization ranges symmetric.
Performs the calibration step and computes the quantization ranges.
from_pretrained
< source >( model_or_path: typing.Union[ForwardRef('ORTModel'), str, pathlib.Path] file_name: typing.Optional[str] = None )
Parameters
- model_or_path (
Union[ORTModel, str, Path]
) — Can be either:- A path to a saved exported ONNX Intermediate Representation (IR) model, e.g., `./my_model_directory/.
- Or an
ORTModelForXX
class, e.g.,ORTModelForQuestionAnswering
.
- file_name(
Optional[str]
, defaults toNone
) — Overwrites the default model file name from"model.onnx"
tofile_name
. This allows you to load different model files from the same repository or directory.
Instantiates a ORTQuantizer
from an ONNX model file or an ORTModel
.
get_calibration_dataset
< source >( dataset_name: str num_samples: int = 100 dataset_config_name: typing.Optional[str] = None dataset_split: typing.Optional[str] = None preprocess_function: typing.Optional[typing.Callable] = None preprocess_batch: bool = True seed: int = 2016 use_auth_token: typing.Union[bool, str, NoneType] = None token: typing.Union[bool, str, NoneType] = None )
Parameters
- dataset_name (
str
) — The dataset repository name on the Hugging Face Hub or path to a local directory containing data files to load to use for the calibration step. - num_samples (
int
, defaults to 100) — The maximum number of samples composing the calibration dataset. - dataset_config_name (
Optional[str]
, defaults toNone
) — The name of the dataset configuration. - dataset_split (
Optional[str]
, defaults toNone
) — Which split of the dataset to use to perform the calibration step. - preprocess_function (
Optional[Callable]
, defaults toNone
) — Processing function to apply to each example after loading dataset. - preprocess_batch (
bool
, defaults toTrue
) — Whether thepreprocess_function
should be batched. - seed (
int
, defaults to 2016) — The random seed to use when shuffling the calibration dataset. - use_auth_token (
Optional[Union[bool,str]]
, defaults toNone
) — Deprecated. Please use thetoken
argument instead. - token (
Optional[Union[bool,str]]
, defaults toNone
) — The token to use as HTTP bearer authorization for remote files. IfTrue
, will use the token generated when runninghuggingface-cli login
(stored inhuggingface_hub.constants.HF_TOKEN_PATH
).
Creates the calibration datasets.Dataset
to use for the post-training static quantization calibration step.
partial_fit
< source >( dataset: Dataset calibration_config: CalibrationConfig onnx_augmented_model_name: typing.Union[str, pathlib.Path] = 'augmented_model.onnx' operators_to_quantize: typing.Optional[typing.List[str]] = None batch_size: int = 1 use_external_data_format: bool = False use_gpu: bool = False force_symmetric_range: bool = False )
Parameters
- dataset (
Dataset
) — The dataset to use when performing the calibration step. - calibration_config (
CalibrationConfig
) — The configuration containing the parameters related to the calibration step. - onnx_augmented_model_name (
Union[str, Path]
, defaults to"augmented_model.onnx"
) — The path used to save the augmented model used to collect the quantization ranges. - operators_to_quantize (
Optional[List[str]]
, defaults toNone
) — List of the operators types to quantize. - batch_size (
int
, defaults to 1) — The batch size to use when collecting the quantization ranges values. - use_external_data_format (
bool
, defaults toFalse
) — Whether uto se external data format to store model which size is >= 2Gb. - use_gpu (
bool
, defaults toFalse
) — Whether to use the GPU when collecting the quantization ranges values. - force_symmetric_range (
bool
, defaults toFalse
) — Whether to make the quantization ranges symmetric.
Performs the calibration step and collects the quantization ranges without computing them.
quantize
< source >( quantization_config: QuantizationConfig save_dir: typing.Union[str, pathlib.Path] file_suffix: typing.Optional[str] = 'quantized' calibration_tensors_range: typing.Optional[typing.Dict[str, typing.Tuple[float, float]]] = None use_external_data_format: bool = False preprocessor: typing.Optional[optimum.onnxruntime.preprocessors.quantization.QuantizationPreprocessor] = None )
Parameters
- quantization_config (
QuantizationConfig
) — The configuration containing the parameters related to quantization. - save_dir (
Union[str, Path]
) — The directory where the quantized model should be saved. - file_suffix (
Optional[str]
, defaults to"quantized"
) — The file_suffix used to save the quantized model. - calibration_tensors_range (
Optional[Dict[str, Tuple[float, float]]]
, defaults toNone
) — The dictionary mapping the nodes name to their quantization ranges, used and required only when applying static quantization. - use_external_data_format (
bool
, defaults toFalse
) — Whether to use external data format to store model which size is >= 2Gb. - preprocessor (
Optional[QuantizationPreprocessor]
, defaults toNone
) — The preprocessor to use to collect the nodes to include or exclude from quantization.
Quantizes a model given the optimization specifications defined in quantization_config
.