mlx-examples/llms
devonthomas35 cc671cd1c7
Mixtral: Fix non-default arg follows default exception (#450)
Mixtral models throw the following exception
```
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/generate.py", line 119, in <module>
    main(args)
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/generate.py", line 96, in main
    model, tokenizer = load(args.model, tokenizer_config=tokenizer_config)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/utils.py", line 278, in load
    model = load_model(model_path)
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/utils.py", line 221, in load_model
    model_class, model_args_class = _get_classes(config=config)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/utils.py", line 46, in _get_classes
    arch = importlib.import_module(f"mlx_lm.models.{model_type}")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/models/mixtral.py", line 11, in <module>
    @dataclass
     ^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/dataclasses.py", line 1230, in dataclass
    return wrap(cls)
           ^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/dataclasses.py", line 1220, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/dataclasses.py", line 1027, in _process_class
    _init_fn(all_init_fields,
  File "/opt/homebrew/anaconda3/lib/python3.11/dataclasses.py", line 545, in _init_fn
    raise TypeError(f'non-default argument {f.name!r} '
TypeError: non-default argument 'model_type' follows default argument
```
2024-02-18 13:30:26 -08:00
..
gguf_llm Change gqa to use repeat instead of concatenate (#443) 2024-02-14 17:40:11 -08:00
llama remove simplify (#379) 2024-01-26 13:54:49 -08:00
mistral Change gqa to use repeat instead of concatenate (#443) 2024-02-14 17:40:11 -08:00
mixtral Change gqa to use repeat instead of concatenate (#443) 2024-02-14 17:40:11 -08:00
mlx_lm Mixtral: Fix non-default arg follows default exception (#450) 2024-02-18 13:30:26 -08:00
speculative_decoding Update README.md (#248) 2024-01-07 20:13:58 -08:00
MANIFEST.in Mlx llm package (#301) 2024-01-12 10:25:56 -08:00
README.md Add StableLM-2 1.6B (#378) 2024-01-26 10:28:00 -08:00
setup.py Fix Qwen2 and SD (#441) 2024-02-14 13:43:12 -08:00

Generate Text with LLMs and MLX

The easiest way to get started is to install the mlx-lm package:

With pip:

pip install mlx-lm

With conda:

conda install -c conda-forge mlx-lm

The mlx-lm package also supports LoRA and QLoRA fine-tuning. For more details on this see the LoRA documentation.

Python API

You can use mlx-lm as a module:

from mlx_lm import load, generate

model, tokenizer = load("mistralai/Mistral-7B-v0.1")

response = generate(model, tokenizer, prompt="hello", verbose=True)

To see a description of all the arguments you can do:

>>> help(generate)

The mlx-lm package also comes with functionality to quantize and optionally upload models to the Hugging Face Hub.

You can convert models in the Python API with:

from mlx_lm import convert

upload_repo = "mlx-community/My-Mistral-7B-v0.1-4bit"

convert("mistralai/Mistral-7B-v0.1", quantize=True, upload_repo=upload_repo)

This will generate a 4-bit quantized Mistral-7B and upload it to the repo mlx-community/My-Mistral-7B-v0.1-4bit. It will also save the converted model in the path mlx_model by default.

To see a description of all the arguments you can do:

>>> help(convert)

Command Line

You can also use mlx-lm from the command line with:

python -m mlx_lm.generate --model mistralai/Mistral-7B-v0.1 --prompt "hello"

This will download a Mistral 7B model from the Hugging Face Hub and generate text using the given prompt.

For a full list of options run:

python -m mlx_lm.generate --help

To quantize a model from the command line run:

python -m mlx_lm.convert --hf-path mistralai/Mistral-7B-v0.1 -q

For more options run:

python -m mlx_lm.convert --help

You can upload new models to Hugging Face by specifying --upload-repo to convert. For example, to upload a quantized Mistral-7B model to the MLX Hugging Face community you can do:

python -m mlx_lm.convert \
    --hf-path mistralai/Mistral-7B-v0.1 \
    -q \
    --upload-repo mlx-community/my-4bit-mistral

Supported Models

The example supports Hugging Face format Mistral, Llama, and Phi-2 style models. If the model you want to run is not supported, file an issue or better yet, submit a pull request.

Here are a few examples of Hugging Face models that work with this example:

Most Mistral, Llama, Phi-2, and Mixtral style models should work out of the box.

For some models (such as Qwen and plamo) the tokenizer requires you to enable the trust_remote_code option. You can do this by passing --trust-remote-code in the command line. If you don't specify the flag explicitly, you will be prompted to trust remote code in the terminal when running the model.

For Qwen models you must also specify the eos_token. You can do this by passing --eos-token "<|endoftext|>" in the command line.

These options can also be set in the Python API. For example:

model, tokenizer = load(
    "qwen/Qwen-7B",
    tokenizer_config={"eos_token": "<|endoftext|>", "trust_remote_code": True},
)