mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-09-01 12:49:50 +08:00

Files

Anchen 31ddbd7806 add deepseek coder example (#172 )

* feat: add example for deepseek coder

* chore: remove hardcoded rope_scaling_factor

* feat: add quantization support

* chore: update readme

* chore: clean up the rope scalling factor param in create cos sin theta

* feat: add repetition_penalty

* style /consistency changes to ease future integration

* nits in README

* one more typo

---------

Co-authored-by: Awni Hannun <awni@apple.com>

2023-12-28 21:42:22 -08:00

convert.py

add deepseek coder example (#172 )

2023-12-28 21:42:22 -08:00

llama.py

Fixed the return type for the __call__ method in Attention (#190 )

2023-12-26 09:32:43 -08:00

README.md

Quantize example (#162 )

2023-12-21 12:59:37 -08:00

requirements.txt

Fix conversion + inference errors. - Mistral (#176 )

2023-12-22 14:10:25 -08:00

sample_prompt.txt

Add llms subdir + update README (#145 )

2023-12-20 10:22:25 -08:00

README.md

Llama

An example of generating text with Llama (1 or 2) using MLX.

Llama is a set of open source language models from Meta AI Research¹² ranging from 7B to 70B parameters. This example also supports Meta's Llama Chat and Code Llama models, as well as the 1.1B TinyLlama models from SUTD.³

Setup

Install the dependencies:

pip install -r requirements.txt

Next, download and convert the model. If you do not have access to the model weights you will need to request access from Meta:

[!TIP] Alternatively, you can also download a few converted checkpoints from the MLX Community organization on Hugging Face and skip the conversion step.

You can download the TinyLlama models directly from Hugging Face.

Convert the weights with:

python convert.py --torch-path <path_to_torch_model>

To generate a 4-bit quantized model use the -q flag:

python convert.py --torch-path <path_to_torch_model> -q

For TinyLlama use

python convert.py --torch-path <path_to_torch_model> --model-name tiny_llama

By default, the conversion script will make the directory mlx_model and save the converted weights.npz, tokenizer.model, and config.json there.

Run

Once you've converted the weights to MLX format, you can interact with the LlamA model:

python llama.py --prompt "hello"

Run python llama.py --help for more details.

For Llama v1 refer to the arXiv paper and blog post for more details. ↩︎
For Llama v2 refer to the blob post ↩︎
For TinyLlama refer to the gihub repository ↩︎