mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-08-11 11:48:39 +08:00

History

someone 2287294723 fix mlx_lm generator for chinese (#321 ) * fix generator for chinese * add REPLACEMENT_CHAR --------- Co-authored-by: cg <cg@qq.com>		2024-01-16 07:13:33 -08:00
..
llama	Qlora (#219 )	2024-01-04 21:05:59 -08:00
mistral	make parameter naming consistent with other examples. (#214 )	2024-01-02 08:18:12 -08:00
mixtral	make parameter naming consistent with other examples. (#214 )	2024-01-02 08:18:12 -08:00
mlx_lm	fix mlx_lm generator for chinese (#321 )	2024-01-16 07:13:33 -08:00
phixtral	Phixtral (#290 )	2024-01-13 08:35:03 -08:00
qwen	add deepseek coder example (#172 )	2023-12-28 21:42:22 -08:00
speculative_decoding	Update README.md (#248 )	2024-01-07 20:13:58 -08:00
MANIFEST.in	Mlx llm package (#301 )	2024-01-12 10:25:56 -08:00
README.md	feat(mlx_lm): add mixtral support in mlx_lm (#318 )	2024-01-15 07:18:14 -08:00
setup.py	fix response + bump version (#319 )	2024-01-15 11:51:21 -08:00

README.md

Generate Text with LLMs and MLX

The easiest way to get started is to install the mlx-lm package:

pip install mlx-lm

Python API

You can use mlx-lm as a module:

from mlx_lm import load, generate

model, tokenizer = load("mistralai/Mistral-7B-v0.1")

response = generate(model, tokenizer, prompt="hello", verbose=True)

To see a description of all the arguments you can do:

>>> help(generate)

The mlx-lm package also comes with functionality to quantize and optionally upload models to the Hugging Face Hub.

You can convert models in the Python API with:

from mlx_lm import convert 

upload_repo = "mlx-community/My-Mistral-7B-v0.1-4bit"

convert("mistralai/Mistral-7B-v0.1", quantize=True, upload_repo=upload_repo)

This will generate a 4-bit quantized Mistral-7B and upload it to the repo mlx-community/My-Mistral-7B-v0.1-4bit. It will also save the converted model in the path mlx_model by default.

To see a description of all the arguments you can do:

>>> help(convert)

Command Line

You can also use mlx-lm from the command line with:

python -m mlx_lm.generate --model mistralai/Mistral-7B-v0.1 --prompt "hello"

This will download a Mistral 7B model from the Hugging Face Hub and generate text using the given prompt.

For a full list of options run:

python -m mlx_lm.generate --help

To quantize a model from the command line run:

python -m mlx_lm.convert --hf-path mistralai/Mistral-7B-v0.1 -q

For more options run:

python -m mlx_lm.convert --help

You can upload new models to Hugging Face by specifying --upload-repo to convert. For example, to upload a quantized Mistral-7B model to the MLX Hugging Face community you can do:

python -m mlx_lm.convert \
    --hf-path mistralai/Mistral-7B-v0.1 \
    -q \
    --upload-repo mlx-community/my-4bit-mistral

Supported Models

The example supports Hugging Face format Mistral, Llama, and Phi-2 style models. If the model you want to run is not supported, file an issue or better yet, submit a pull request.

Here are a few examples of Hugging Face models that work with this example:

Most Mistral, Llama, Phi-2 and Mixtral style models should work out of the box.