support for tiny llama (#129)

2025-12-16 02:08:55 +08:00 · 2023-12-18 07:47:55 -08:00
parent 08e862336a
commit 44b546d446
3 changed files with 140 additions and 45 deletions
--- a/llama/README.md
+++ b/llama/README.md
@@ -3,8 +3,8 @@
 An example of generating text with Llama (1 or 2) using MLX.

 Llama is a set of open source language models from Meta AI Research[^1][^2]
-ranging from 7B to 70B parameters. This example also supports Llama Chat and
-Code Llama.
+ranging from 7B to 70B parameters. This example also supports Meta's Llama Chat
+and Code Llama models, as well as the 1.1B TinyLlama models from SUTD.[^3]

 ### Setup

@@ -25,10 +25,19 @@ Alternatively, you can also download a select converted checkpoints from the
 [mlx-llama](https://huggingface.co/mlx-llama) community organisation on Hugging
 Face and skip the conversion step.

+You can download the TinyLlama models directly from [Hugging
+Face](https://huggingface.co/TinyLlama).
+
 Convert the weights with:

 ```
-python convert.py --model_path <path_to_torch_model>
+python convert.py --model-path <path_to_torch_model>
+```
+
+For TinyLlama use
+
+```
+python convert.py --model-path <path_to_torch_model> --model-name tiny_llama
 ```

 The conversion script will save the converted weights in the same location.
@@ -39,10 +48,11 @@ Once you've converted the weights to MLX format, you can interact with the
 LlaMA model:

 ```
-python llama.py <path_to_model> <path_to_tokenizer.model> "hello"
+python llama.py <path_to_model> <path_to_tokenizer.model> --prompt "hello"
 ```

 Run `python llama.py --help` for more details.

 [^1]: For Llama v1 refer to the [arXiv paper](https://arxiv.org/abs/2302.13971) and [blog post](https://ai.meta.com/blog/large-language-model-llama-meta-ai/) for more details.
 [^2]: For Llama v2 refer to the [blob post](https://ai.meta.com/llama/)
+[^3]: For TinyLlama refer to the [gihub repository](https://github.com/jzhang38/TinyLlama?tab=readme-ov-file)