mlx-examples/llms
someone 2287294723
fix mlx_lm generator for chinese (#321)
* fix generator for chinese

* add REPLACEMENT_CHAR

---------

Co-authored-by: cg <cg@qq.com>
2024-01-16 07:13:33 -08:00
..
llama Qlora (#219) 2024-01-04 21:05:59 -08:00
mistral make parameter naming consistent with other examples. (#214) 2024-01-02 08:18:12 -08:00
mixtral make parameter naming consistent with other examples. (#214) 2024-01-02 08:18:12 -08:00
mlx_lm fix mlx_lm generator for chinese (#321) 2024-01-16 07:13:33 -08:00
phixtral Phixtral (#290) 2024-01-13 08:35:03 -08:00
qwen add deepseek coder example (#172) 2023-12-28 21:42:22 -08:00
speculative_decoding Update README.md (#248) 2024-01-07 20:13:58 -08:00
MANIFEST.in Mlx llm package (#301) 2024-01-12 10:25:56 -08:00
README.md feat(mlx_lm): add mixtral support in mlx_lm (#318) 2024-01-15 07:18:14 -08:00
setup.py fix response + bump version (#319) 2024-01-15 11:51:21 -08:00

Generate Text with LLMs and MLX

The easiest way to get started is to install the mlx-lm package:

pip install mlx-lm

Python API

You can use mlx-lm as a module:

from mlx_lm import load, generate

model, tokenizer = load("mistralai/Mistral-7B-v0.1")

response = generate(model, tokenizer, prompt="hello", verbose=True)

To see a description of all the arguments you can do:

>>> help(generate)

The mlx-lm package also comes with functionality to quantize and optionally upload models to the Hugging Face Hub.

You can convert models in the Python API with:

from mlx_lm import convert 

upload_repo = "mlx-community/My-Mistral-7B-v0.1-4bit"

convert("mistralai/Mistral-7B-v0.1", quantize=True, upload_repo=upload_repo)

This will generate a 4-bit quantized Mistral-7B and upload it to the repo mlx-community/My-Mistral-7B-v0.1-4bit. It will also save the converted model in the path mlx_model by default.

To see a description of all the arguments you can do:

>>> help(convert)

Command Line

You can also use mlx-lm from the command line with:

python -m mlx_lm.generate --model mistralai/Mistral-7B-v0.1 --prompt "hello"

This will download a Mistral 7B model from the Hugging Face Hub and generate text using the given prompt.

For a full list of options run:

python -m mlx_lm.generate --help

To quantize a model from the command line run:

python -m mlx_lm.convert --hf-path mistralai/Mistral-7B-v0.1 -q 

For more options run:

python -m mlx_lm.convert --help

You can upload new models to Hugging Face by specifying --upload-repo to convert. For example, to upload a quantized Mistral-7B model to the MLX Hugging Face community you can do:

python -m mlx_lm.convert \
    --hf-path mistralai/Mistral-7B-v0.1 \
    -q \
    --upload-repo mlx-community/my-4bit-mistral

Supported Models

The example supports Hugging Face format Mistral, Llama, and Phi-2 style models. If the model you want to run is not supported, file an issue or better yet, submit a pull request.

Here are a few examples of Hugging Face models that work with this example:

Most Mistral, Llama, Phi-2 and Mixtral style models should work out of the box.