mlx.nn.QuantizedLinear#
- class QuantizedLinear(input_dims: int, output_dims: int, bias: bool = True, group_size: int = 64, bits: int = 4)#
 Applies an affine transformation to the input using a quantized weight matrix.
It is the quantized equivalent of
mlx.nn.Linear. For now its parameters are frozen and will not be included in any gradient computation but this will probably change in the future.QuantizedLinearalso provides a classmethodfrom_linear()to convert linear layers toQuantizedLinearlayers.- Parameters:
 input_dims (int) – The dimensionality of the input features.
output_dims (int) – The dimensionality of the output features.
bias (bool, optional) – If set to
Falsethen the layer will not use a bias. Default:True.group_size (int, optional) – The group size to use for the quantized weight. See
quantize(). Default:64.bits (int, optional) – The bit width to use for the quantized weight. See
quantize(). Default:4.
Methods
from_linear(linear_layer[, group_size, bits])Create a
QuantizedLinearlayer from aLinearlayer.