mlx.nn.QuantizedLinear#

class mlx.nn.QuantizedLinear(input_dims: int, output_dims: int, bias: bool = True, group_size: int = 64, bits: int = 4)#

Applies an affine transformation to the input using a quantized weight matrix.

It is the quantized equivalent of mlx.nn.Linear. For now its parameters are frozen and will not be included in any gradient computation but this will probably change in the future.

QuantizedLinear also provides two useful classmethods to convert linear layers to QuantizedLinear layers.

from_linear() returns a QuantizedLinear layer that applies the same linear transformation up to the quantization error.
quantize_module() swaps all the linear layers of the passed module with QuantizedLinear ones.

Parameters:

input_dims (int) – The dimensionality of the input features
output_dims (int) – The dimensionality of the output features
bias (bool, optional) – If set to False then the layer will not use a bias. (default: True).
group_size (int, optional) – The group size to use for the quantized weight. See quantize(). (default: 64)
bits (int, optional) – The bit width to use for the quantized weight. See quantize(). (default: 4)