mlx.nn.QuantizedEmbedding#
- class QuantizedEmbedding(num_embeddings: int, dims: int, group_size: int = 64, bits: int = 4, mode: str = 'affine')#
The same as
Embeddingbut with a quantized weight matrix.QuantizedEmbeddingalso provides afrom_embedding()classmethod to convert embedding layers toQuantizedEmbeddinglayers.- Parameters:
num_embeddings (int) – How many possible discrete tokens can we embed. Usually called the vocabulary size.
dims (int) – The dimensionality of the embeddings.
group_size (int, optional) – The group size to use for the quantized weight. See
quantize(). Default:64.bits (int, optional) – The bit width to use for the quantized weight. See
quantize(). Default:4.mode (str) – The quantization method to use (see
mlx.core.quantize()). Default:"affine".
Methods
as_linear(x)Call the quantized embedding layer as a quantized linear layer.
from_embedding(embedding_layer[, ...])Create a
QuantizedEmbeddinglayer from anEmbeddinglayer.