mlx-examples/llama
Vaibhav Srivastav 4b7e11bd31
Add URLs to HF MLX-Community org. (#153)
* up

* Add ref to MLX org on the README.

* nit: language.

* Standardise org name.
2023-12-20 06:57:13 -08:00
..
convert.py support for tiny llama (#129) 2023-12-18 07:47:55 -08:00
llama.py Pass few shot file name to --few-shot arg(#141) 2023-12-18 13:30:04 -08:00
README.md Add URLs to HF MLX-Community org. (#153) 2023-12-20 06:57:13 -08:00
requirements.txt llama v2 with sharded weights 2023-12-12 12:48:15 -08:00
sample_prompt.txt Add the Llama and Stable Diffusion examples 2023-11-29 10:38:20 -08:00

Llama

An example of generating text with Llama (1 or 2) using MLX.

Llama is a set of open source language models from Meta AI Research12 ranging from 7B to 70B parameters. This example also supports Meta's Llama Chat and Code Llama models, as well as the 1.1B TinyLlama models from SUTD.3

Setup

Install the dependencies:

pip install -r requirements.txt

Next, download and convert the model. If you do not have access to the model weights you will need to request access from Meta:

Tip

Alternatively, you can also download a few converted checkpoints from the the MLX Community organisation on Hugging Face and skip the conversion step.

You can download the TinyLlama models directly from Hugging Face.

Convert the weights with:

python convert.py --model-path <path_to_torch_model>

For TinyLlama use

python convert.py --model-path <path_to_torch_model> --model-name tiny_llama

The conversion script will save the converted weights in the same location.

Run

Once you've converted the weights to MLX format, you can interact with the LlaMA model:

python llama.py <path_to_model> <path_to_tokenizer.model> --prompt "hello"

Run python llama.py --help for more details.


  1. For Llama v1 refer to the arXiv paper and blog post for more details. ↩︎

  2. For Llama v2 refer to the blob post ↩︎

  3. For TinyLlama refer to the gihub repository ↩︎