![]() - bert/model.py:10: tree_unflatten - bert/model.py:2: dataclass - bert/model.py:8: numpy - cifar/resnet.py:6: Any - clip/model.py:15: tree_flatten - clip/model.py:9: Union - gcn/main.py:8: download_cora - gcn/main.py:9: cross_entropy - llms/gguf_llm/models.py:12: tree_flatten, tree_unflatten - llms/gguf_llm/models.py:9: numpy - llms/mixtral/mixtral.py:12: tree_map - llms/mlx_lm/models/dbrx.py:2: Dict, Union - llms/mlx_lm/tuner/trainer.py:5: partial - llms/speculative_decoding/decoder.py:1: dataclass, field - llms/speculative_decoding/decoder.py:2: Optional - llms/speculative_decoding/decoder.py:5: mlx.nn - llms/speculative_decoding/decoder.py:6: numpy - llms/speculative_decoding/main.py:2: glob - llms/speculative_decoding/main.py:3: json - llms/speculative_decoding/main.py:5: Path - llms/speculative_decoding/main.py:8: mlx.nn - llms/speculative_decoding/model.py:6: tree_unflatten - llms/speculative_decoding/model.py:7: AutoTokenizer - llms/tests/test_lora.py:13: yaml_loader - lora/lora.py:14: tree_unflatten - lora/models.py:11: numpy - lora/models.py:3: glob - speechcommands/kwt.py:1: Any - speechcommands/main.py:7: mlx.data - stable_diffusion/stable_diffusion/model_io.py:4: partial - whisper/benchmark.py:5: sys - whisper/test.py:5: subprocess - whisper/whisper/audio.py:6: Optional - whisper/whisper/decoding.py:8: mlx.nn |
||
---|---|---|
.. | ||
convert.py | ||
mixtral.py | ||
params.json | ||
README.md | ||
requirements.txt |
Mixtral 8x7B
Run the Mixtral1 8x7B mixture-of-experts (MoE) model in MLX on Apple silicon.
This example also supports the instruction fine-tuned Mixtral model.[^instruct]
Note, for 16-bit precision this model needs a machine with substantial RAM (~100GB) to run.
Setup
Install Git Large File Storage. For example with Homebrew:
brew install git-lfs
Download the models from Hugging Face:
For the base model use:
export MIXTRAL_MODEL=Mixtral-8x7B-v0.1
For the instruction fine-tuned model use:
export MIXTRAL_MODEL=Mixtral-8x7B-Instruct-v0.1
Then run:
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/mistralai/${MIXTRAL_MODEL}/
cd $MIXTRAL_MODEL/ && \
git lfs pull --include "consolidated.*.pt" && \
git lfs pull --include "tokenizer.model"
Now from mlx-exmaples/mixtral
convert and save the weights as NumPy arrays so
MLX can read them:
python convert.py --torch-path $MIXTRAL_MODEL/
To generate a 4-bit quantized model, use -q
. For a full list of options:
python convert.py --help
By default, the conversion script will make the directory mlx_model
and save
the converted weights.npz
, tokenizer.model
, and config.json
there.
Generate
As easy as:
python mixtral.py --model-path mlx_model
For more options including how to prompt the model, run:
python mixtral.py --help
For the Instruction model, make sure to follow the prompt format:
[INST] Instruction prompt [/INST]
-
Refer to Mistral's blog post and the Hugging Face blog post for more details. ↩︎