mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-12-16 02:08:55 +08:00

Files

Awni Hannun 2146bcd7ee Quantize embedding / Update quantize API (#680 )

* more async eval

* quantize embedding / update quantize api

* more updates for quantize

* update for quantize embeddings

* update sd quant API

* update sdxl quants

* error for datasets < batch_size

* async

* fix config loading

* fix quant

* fix tests

* fix req

* remove lm head if tie weights is true

* fix test

2024-04-18 18:16:10 -07:00

convert.py

Quantize embedding / Update quantize API (#680 )

2024-04-18 18:16:10 -07:00

mixtral.py

Quantize embedding / Update quantize API (#680 )

2024-04-18 18:16:10 -07:00

params.json

Add llms subdir + update README (#145 )

2023-12-20 10:22:25 -08:00

README.md

Fix generate example in README (#197 )

2023-12-27 13:11:10 -08:00

requirements.txt

Quantize embedding / Update quantize API (#680 )

2024-04-18 18:16:10 -07:00

README.md

Mixtral 8x7B

Run the Mixtral¹ 8x7B mixture-of-experts (MoE) model in MLX on Apple silicon.

This example also supports the instruction fine-tuned Mixtral model.[^instruct]

Note, for 16-bit precision this model needs a machine with substantial RAM (~100GB) to run.

Setup

Install Git Large File Storage. For example with Homebrew:

brew install git-lfs

Download the models from Hugging Face:

For the base model use:

export MIXTRAL_MODEL=Mixtral-8x7B-v0.1

For the instruction fine-tuned model use:

export MIXTRAL_MODEL=Mixtral-8x7B-Instruct-v0.1

Then run:

GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/mistralai/${MIXTRAL_MODEL}/
cd $MIXTRAL_MODEL/ && \
  git lfs pull --include "consolidated.*.pt" && \
  git lfs pull --include "tokenizer.model"

Now from mlx-exmaples/mixtral convert and save the weights as NumPy arrays so MLX can read them:

python convert.py --torch-path $MIXTRAL_MODEL/

To generate a 4-bit quantized model, use -q. For a full list of options:

python convert.py --help

By default, the conversion script will make the directory mlx_model and save the converted weights.npz, tokenizer.model, and config.json there.

Generate

As easy as:

python mixtral.py --model-path mlx_model

For more options including how to prompt the model, run:

python mixtral.py --help

For the Instruction model, make sure to follow the prompt format:

[INST] Instruction prompt [/INST]

Refer to Mistral's blog post and the Hugging Face blog post for more details. ↩︎