mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-12-16 02:08:55 +08:00

Files

Anchen 30be4c4734 refactor(qwen): moving qwen into mlx-lm (#312 )

* refactor(qwen): moving qwen into mlx-lm

* chore: update doc

* chore: fix type hint

* add qwen model support in convert

* chore: fix doc

* chore: only load model in quantize_model

* chore: make the convert script only copy tokenizer files instead of load it and save

* chore: update docstring

* chore: remove unnecessary try catch

* chore: clean up for tokenizer and update  transformers 4.37

* nits in README

---------

Co-authored-by: Awni Hannun <awni@apple.com>

2024-01-22 15:00:07 -08:00

llama

Missing requirements needed for convert script (#320 )

2024-01-18 19:04:24 -08:00

mistral

make parameter naming consistent with other examples. (#214 )

2024-01-02 08:18:12 -08:00

mixtral

make parameter naming consistent with other examples. (#214 )

2024-01-02 08:18:12 -08:00

mlx_lm

refactor(qwen): moving qwen into mlx-lm (#312 )

2024-01-22 15:00:07 -08:00

phixtral

Phixtral (#290 )

2024-01-13 08:35:03 -08:00

speculative_decoding

Update README.md (#248 )

2024-01-07 20:13:58 -08:00

MANIFEST.in

Mlx llm package (#301 )

2024-01-12 10:25:56 -08:00

README.md

refactor(qwen): moving qwen into mlx-lm (#312 )

2024-01-22 15:00:07 -08:00

setup.py

fix response + bump version (#319 )

2024-01-15 11:51:21 -08:00

README.md

Generate Text with LLMs and MLX

The easiest way to get started is to install the mlx-lm package:

pip install mlx-lm

Python API

You can use mlx-lm as a module:

from mlx_lm import load, generate

model, tokenizer = load("mistralai/Mistral-7B-v0.1")

response = generate(model, tokenizer, prompt="hello", verbose=True)

To see a description of all the arguments you can do:

>>> help(generate)

The mlx-lm package also comes with functionality to quantize and optionally upload models to the Hugging Face Hub.

You can convert models in the Python API with:

from mlx_lm import convert 

upload_repo = "mlx-community/My-Mistral-7B-v0.1-4bit"

convert("mistralai/Mistral-7B-v0.1", quantize=True, upload_repo=upload_repo)

This will generate a 4-bit quantized Mistral-7B and upload it to the repo mlx-community/My-Mistral-7B-v0.1-4bit. It will also save the converted model in the path mlx_model by default.

To see a description of all the arguments you can do:

>>> help(convert)

Command Line

You can also use mlx-lm from the command line with:

python -m mlx_lm.generate --model mistralai/Mistral-7B-v0.1 --prompt "hello"

This will download a Mistral 7B model from the Hugging Face Hub and generate text using the given prompt.

For a full list of options run:

python -m mlx_lm.generate --help

To quantize a model from the command line run:

python -m mlx_lm.convert --hf-path mistralai/Mistral-7B-v0.1 -q

For more options run:

python -m mlx_lm.convert --help

You can upload new models to Hugging Face by specifying --upload-repo to convert. For example, to upload a quantized Mistral-7B model to the MLX Hugging Face community you can do:

python -m mlx_lm.convert \
    --hf-path mistralai/Mistral-7B-v0.1 \
    -q \
    --upload-repo mlx-community/my-4bit-mistral

Supported Models

The example supports Hugging Face format Mistral, Llama, and Phi-2 style models. If the model you want to run is not supported, file an issue or better yet, submit a pull request.

Here are a few examples of Hugging Face models that work with this example:

Most Mistral, Llama, Phi-2, and Mixtral style models should work out of the box.

For Qwen style models, you must enable the trust_remote_code option and specify the eos_token. This ensures the tokenizer works correctly. You can do this by passing --trust-remote-code and --eos-token "<|endoftext|>" in the command line, or by setting these options in the Python API:

model, tokenizer = load(
    "qwen/Qwen-7B",
    tokenizer_config={"eos_token": "<|endoftext|>", "trust_remote_code": True},
)