mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-10-24 06:28:07 +08:00
Quantize example (#162)
* testing quantization * conversion + quantization working * one config processor * quantization in mistral / nits in llama * args for quantization * llama / mistral conversion in good shape * phi2 quantized * mixtral * qwen conversion
This commit is contained in:
@@ -23,10 +23,17 @@ tar -xf mistral-7B-v0.1.tar
|
||||
Then, convert the weights with:
|
||||
|
||||
```
|
||||
python convert.py
|
||||
python convert.py --torch-path <path_to_torch>
|
||||
```
|
||||
|
||||
The conversion script will save the converted weights in the same location.
|
||||
To generate a 4-bit quantized model, use ``-q``. For a full list of options:
|
||||
|
||||
```
|
||||
python convert.py --help
|
||||
```
|
||||
|
||||
By default, the conversion script will make the directory `mlx_model` and save
|
||||
the converted `weights.npz`, `tokenizer.model`, and `config.json` there.
|
||||
|
||||
> [!TIP]
|
||||
> Alternatively, you can also download a few converted checkpoints from the
|
||||
@@ -40,7 +47,7 @@ Once you've converted the weights to MLX format, you can generate text with
|
||||
the Mistral model:
|
||||
|
||||
```
|
||||
python mistral.py --prompt "It is a truth universally acknowledged," --temp 0
|
||||
python mistral.py --prompt "It is a truth universally acknowledged,"
|
||||
```
|
||||
|
||||
Run `python mistral.py --help` for more details.
|
||||
|
||||
Reference in New Issue
Block a user