mlx-examples/llms/mixtral
dmdaksh 7d7e236061
- Removed unused Python imports (#683)
- bert/model.py:10: tree_unflatten
  - bert/model.py:2: dataclass
  - bert/model.py:8: numpy
  - cifar/resnet.py:6: Any
  - clip/model.py:15: tree_flatten
  - clip/model.py:9: Union
  - gcn/main.py:8: download_cora
  - gcn/main.py:9: cross_entropy
  - llms/gguf_llm/models.py:12: tree_flatten, tree_unflatten
  - llms/gguf_llm/models.py:9: numpy
  - llms/mixtral/mixtral.py:12: tree_map
  - llms/mlx_lm/models/dbrx.py:2: Dict, Union
  - llms/mlx_lm/tuner/trainer.py:5: partial
  - llms/speculative_decoding/decoder.py:1: dataclass, field
  - llms/speculative_decoding/decoder.py:2: Optional
  - llms/speculative_decoding/decoder.py:5: mlx.nn
  - llms/speculative_decoding/decoder.py:6: numpy
  - llms/speculative_decoding/main.py:2: glob
  - llms/speculative_decoding/main.py:3: json
  - llms/speculative_decoding/main.py:5: Path
  - llms/speculative_decoding/main.py:8: mlx.nn
  - llms/speculative_decoding/model.py:6: tree_unflatten
  - llms/speculative_decoding/model.py:7: AutoTokenizer
  - llms/tests/test_lora.py:13: yaml_loader
  - lora/lora.py:14: tree_unflatten
  - lora/models.py:11: numpy
  - lora/models.py:3: glob
  - speechcommands/kwt.py:1: Any
  - speechcommands/main.py:7: mlx.data
  - stable_diffusion/stable_diffusion/model_io.py:4: partial
  - whisper/benchmark.py:5: sys
  - whisper/test.py:5: subprocess
  - whisper/whisper/audio.py:6: Optional
  - whisper/whisper/decoding.py:8: mlx.nn
2024-04-16 07:50:32 -07:00
..
convert.py add deepseek coder example (#172) 2023-12-28 21:42:22 -08:00
mixtral.py - Removed unused Python imports (#683) 2024-04-16 07:50:32 -07:00
params.json Add llms subdir + update README (#145) 2023-12-20 10:22:25 -08:00
README.md Fix generate example in README (#197) 2023-12-27 13:11:10 -08:00
requirements.txt Switch to fast RMS/LN Norm (#603) 2024-03-23 07:13:51 -07:00

Mixtral 8x7B

Run the Mixtral1 8x7B mixture-of-experts (MoE) model in MLX on Apple silicon.

This example also supports the instruction fine-tuned Mixtral model.[^instruct]

Note, for 16-bit precision this model needs a machine with substantial RAM (~100GB) to run.

Setup

Install Git Large File Storage. For example with Homebrew:

brew install git-lfs

Download the models from Hugging Face:

For the base model use:

export MIXTRAL_MODEL=Mixtral-8x7B-v0.1

For the instruction fine-tuned model use:

export MIXTRAL_MODEL=Mixtral-8x7B-Instruct-v0.1

Then run:

GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/mistralai/${MIXTRAL_MODEL}/
cd $MIXTRAL_MODEL/ && \
  git lfs pull --include "consolidated.*.pt" && \
  git lfs pull --include "tokenizer.model"

Now from mlx-exmaples/mixtral convert and save the weights as NumPy arrays so MLX can read them:

python convert.py --torch-path $MIXTRAL_MODEL/

To generate a 4-bit quantized model, use -q. For a full list of options:

python convert.py --help

By default, the conversion script will make the directory mlx_model and save the converted weights.npz, tokenizer.model, and config.json there.

Generate

As easy as:

python mixtral.py --model-path mlx_model

For more options including how to prompt the model, run:

python mixtral.py --help

For the Instruction model, make sure to follow the prompt format:

[INST] Instruction prompt [/INST]

  1. Refer to Mistral's blog post and the Hugging Face blog post for more details. ↩︎