mlx-examples

zhangyiss/mlx-examples

Fork 0

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-12-14 09:05:48 +08:00

Files

History

Awni Hannun 2ffd0da009 mixtral runs a bit faster

2023-12-12 08:36:40 -08:00

convert.py

initial mixtral

2023-12-12 07:44:23 -08:00

mixtral.py

mixtral runs a bit faster

2023-12-12 08:36:40 -08:00

README.md

mixtral runs a bit faster

2023-12-12 08:36:40 -08:00

requirements.txt

initial mixtral

2023-12-12 07:44:23 -08:00

README.md

Mixtral 8x7b

Run the Mixtral[^mixtral] 8x7B mixture-of-experts (MoE) model in MLX on Apple silicon.

Note, this model needs a machine with substantial RAM (>= 128GB) to run in 16-bit precision.

Setup

Install Git Large File Storage. For example with Homebrew:

brew install git-lfs

Download the models from HugginFace:

git clone https://huggingface.co/someone13574/mixtral-8x7b-32kseqlen

After that's done, combine the files:

cd mixtral-8x7b-32kseqlen/
cat consolidated.00.pth-split0 consolidated.00.pth-split1 consolidated.00.pth-split2 consolidated.00.pth-split3 consolidated.00.pth-split4 consolidated.00.pth-split5 consolidated.00.pth-split6 consolidated.00.pth-split7 consolidated.00.pth-split8 consolidated.00.pth-split9 consolidated.00.pth-split10 > consolidated.00.pth

Now from mlx-exmaples/mixtral conver the weights to NumPy so MLX can read them:

python convert.py --model_path mixtral-8x7b-32kseqlen/

The conversion script will save the converted weights in the same location.

After that's done, if you want to clean some stuff up:

rm mixtral-8x7b-32kseqlen/*.pth*

Generate

As easy as:

python mixtral.py --model_path mixtral mixtral-8x7b-32kseqlen/

[^mixtral] Refer to Mistral's blog post for more details.