Quantization for Ryzen AI IPU
Please refer to the guide How to apply quantization to understand how to use the following classes to quantize models targeting Ryzen AI IPU.
Using Vitis AI Quantizer
RyzenAIOnnxQuantizer
class optimum.amd.ryzenai.RyzenAIOnnxQuantizer
< source >( onnx_model_path: Path config: Optional = None )
Handles the RyzenAI quantization process for models shared on huggingface.co/models.
from_pretrained
< source >( model_or_path: Union file_name: Optional = None )
Parameters
- model_or_path (
Union[str, Path]
) — Can be either:- A path to a saved exported ONNX Intermediate Representation (IR) model, e.g., `./my_model_directory/.
- file_name(
Optional[str]
, defaults toNone
) — Overwrites the default model file name from"model.onnx"
tofile_name
. This allows you to load different model files from the same repository or directory.
Instantiates a RyzenAIOnnxQuantizer
from an ONNX model file.
get_calibration_dataset
< source >( dataset_name: str num_samples: int = 100 dataset_config_name: Optional = None dataset_split: Optional = None preprocess_function: Optional = None preprocess_batch: bool = True seed: Optional = 2016 token: bool = None streaming: bool = False )
Parameters
- dataset_name (
str
) — The dataset repository name on the Hugging Face Hub or path to a local directory containing data files to load to use for the calibration step. - num_samples (
int
, defaults to 100) — The maximum number of samples composing the calibration dataset. - dataset_config_name (
Optional[str]
, defaults toNone
) — The name of the dataset configuration. - dataset_split (
Optional[str]
, defaults toNone
) — Which split of the dataset to use to perform the calibration step. - preprocess_function (
Optional[Callable]
, defaults toNone
) — Processing function to apply to each example after loading dataset. - preprocess_batch (
bool
, defaults toTrue
) — Whether thepreprocess_function
should be batched. - seed (
int
, defaults to 2016) — The random seed to use when shuffling the calibration dataset. - token (
bool
, defaults toFalse
) — Whether to use the token generated when runningtransformers-cli login
(necessary for some datasets like ImageNet).
Creates the calibration datasets.Dataset
to use for the post-training static quantization calibration step.
quantize
< source >( quantization_config: QuantizationConfig dataset: Dataset save_dir: Union batch_size: int = 1 file_suffix: Optional = 'quantized' )
Parameters
- quantization_config (
QuantizationConfig
) — The configuration containing the parameters related to quantization. - save_dir (
Union[str, Path]
) — The directory where the quantized model should be saved. - file_suffix (
Optional[str]
, defaults to"quantized"
) — The file_suffix used to save the quantized model. - calibration_tensors_range (
Optional[Dict[str, Tuple[float, float]]]
, defaults toNone
) — The dictionary mapping the nodes name to their quantization ranges, used and required only when applying static quantization.
Quantizes a model given the optimization specifications defined in quantization_config
.
QuantizationConfig
class optimum.amd.ryzenai.QuantizationConfig
< source >( format: QuantFormat = <QuantFormat.QDQ: 1> calibration_method: CalibrationMethod = <PowerOfTwoMethod.MinMSE: 1> activations_dtype: QuantType = <QuantType.QUInt8: 1> activations_symmetric: bool = True weights_dtype: QuantType = <QuantType.QInt8: 0> weights_symmetric: bool = True enable_dpu: bool = True )
Parameters
- is_static (
bool
) — Whether to apply static quantization or dynamic quantization. - format (
QuantFormat
) — Targeted RyzenAI quantization representation format. For the Operator Oriented (QOperator) format, all the quantized operators have their own ONNX definitions. For the Tensor Oriented (QDQ) format, the model is quantized by inserting QuantizeLinear / DeQuantizeLinear operators. - calibration_method (
CalibrationMethod
) — The method chosen to calculate the activations quantization parameters using the calibration dataset. - activations_dtype (
QuantType
, defaults toQuantType.QUInt8
) — The quantization data types to use for the activations. - activations_symmetric (
bool
, defaults toFalse
) — Whether to apply symmetric quantization on the activations. - weights_dtype (
QuantType
, defaults toQuantType.QInt8
) — The quantization data types to use for the weights. - weights_symmetric (
bool
, defaults toTrue
) — Whether to apply symmetric quantization on the weights. - enable_dpu (
bool
, defaults toTrue
) — Determines whether to generate a quantized model that is suitable for the DPU. If set to True, the quantization process will create a model that is optimized for DPU computations.
QuantizationConfig is the configuration class handling all the RyzenAI quantization parameters.