![]() * Add colorized output option to generate script Two new functions were added to the script that allow output to be colorized based on the T[0] probability. Changes were made to the `generate_step` function in utils.py to permit colorization. Additionally, an argument for colorization was introduced to the command-line parser. * Rename 'colorize' parameter with 'return_probability' in generate_step |
||
---|---|---|
.. | ||
llama | ||
mistral | ||
mixtral | ||
mlx_lm | ||
phixtral | ||
speculative_decoding | ||
MANIFEST.in | ||
README.md | ||
setup.py |
Generate Text with LLMs and MLX
The easiest way to get started is to install the mlx-lm
package:
With pip
:
pip install mlx-lm
With conda
:
conda install -c conda-forge mlx-lm
Python API
You can use mlx-lm
as a module:
from mlx_lm import load, generate
model, tokenizer = load("mistralai/Mistral-7B-v0.1")
response = generate(model, tokenizer, prompt="hello", verbose=True)
To see a description of all the arguments you can do:
>>> help(generate)
The mlx-lm
package also comes with functionality to quantize and optionally
upload models to the Hugging Face Hub.
You can convert models in the Python API with:
from mlx_lm import convert
upload_repo = "mlx-community/My-Mistral-7B-v0.1-4bit"
convert("mistralai/Mistral-7B-v0.1", quantize=True, upload_repo=upload_repo)
This will generate a 4-bit quantized Mistral-7B and upload it to the
repo mlx-community/My-Mistral-7B-v0.1-4bit
. It will also save the
converted model in the path mlx_model
by default.
To see a description of all the arguments you can do:
>>> help(convert)
Command Line
You can also use mlx-lm
from the command line with:
python -m mlx_lm.generate --model mistralai/Mistral-7B-v0.1 --prompt "hello"
This will download a Mistral 7B model from the Hugging Face Hub and generate text using the given prompt.
For a full list of options run:
python -m mlx_lm.generate --help
To quantize a model from the command line run:
python -m mlx_lm.convert --hf-path mistralai/Mistral-7B-v0.1 -q
For more options run:
python -m mlx_lm.convert --help
You can upload new models to Hugging Face by specifying --upload-repo
to
convert
. For example, to upload a quantized Mistral-7B model to the
MLX Hugging Face community you can do:
python -m mlx_lm.convert \
--hf-path mistralai/Mistral-7B-v0.1 \
-q \
--upload-repo mlx-community/my-4bit-mistral
Supported Models
The example supports Hugging Face format Mistral, Llama, and Phi-2 style models. If the model you want to run is not supported, file an issue or better yet, submit a pull request.
Here are a few examples of Hugging Face models that work with this example:
- mistralai/Mistral-7B-v0.1
- meta-llama/Llama-2-7b-hf
- deepseek-ai/deepseek-coder-6.7b-instruct
- 01-ai/Yi-6B-Chat
- microsoft/phi-2
- mistralai/Mixtral-8x7B-Instruct-v0.1
- Qwen/Qwen-7B
Most Mistral, Llama, Phi-2, and Mixtral style models should work out of the box.
For
Qwen
style models, you must enable the trust_remote_code
option and specify the
eos_token
. This ensures the tokenizer works correctly. You can do this by
passing --trust-remote-code
and --eos-token "<|endoftext|>"
in the command
line, or by setting these options in the Python API:
model, tokenizer = load(
"qwen/Qwen-7B",
tokenizer_config={"eos_token": "<|endoftext|>", "trust_remote_code": True},
)