Add PLaMo-13B model as an LLM example (#303)

* Convert HF weights of PLaMo and load it to a plamo model in mlx * Fix model inference part * Add bos at the beginning of the prompt * Fix convert.py to copy tokenizer.model into the converted dir * Use the required insturction format in generate.py when "--instruct" option is specified * Change filenames and update existing scripts * Add README * Add requirements.txt * Fix plamo.py to stop generation when EOS appears * Add quantization to convert.py * Use mlx>=0.0.9 for mx.core.outer() in PLaMo model * Update acknowledgements.md * Fix card text in upload_to_hub() * Not use prompt template when --instruct is not specified * Ask if you trust_remote_code for loading tokenizer of PLaMo * Check the user trusts the remote code when converting * Remove plamo directory * Update README * Add PLaMo model file * Fix the handling of cache in PLaMo and update README * Ask if trust_remote_code only when the model is PLaMo * Remove resolve_trust_remote_code from convert.py and use the latest transformers * Remove code not to add EOS * Update README to fix an example not to use noncommercial version of the model * Remove unused imports * Remove unnecessary description about the instruct model of PLaMo from README * format, nits in README * typo --------- Co-authored-by: Shunta Saito <shunta@mitmul-mbp.local> Co-authored-by: Awni Hannun <awni@apple.com>
2025-12-16 02:08:55 +08:00 · 2024-01-24 00:17:24 +09:00
parent c45c2311bd
commit 85c1ff8fd6
4 changed files with 387 additions and 13 deletions
--- a/llms/README.md
+++ b/llms/README.md
@@ -38,7 +38,7 @@ upload models to the Hugging Face Hub.
 You can convert models in the Python API with:

 ```python
-from mlx_lm import convert 
+from mlx_lm import convert

 upload_repo = "mlx-community/My-Mistral-7B-v0.1-4bit"

@@ -55,7 +55,7 @@ To see a description of all the arguments you can do:
 >>> help(convert)
 ```

-### Command Line 
+### Command Line

 You can also use `mlx-lm` from the command line with:

@@ -64,7 +64,7 @@ python -m mlx_lm.generate --model mistralai/Mistral-7B-v0.1 --prompt "hello"
 ```

 This will download a Mistral 7B model from the Hugging Face Hub and generate
-text using the given prompt. 
+text using the given prompt.

 For a full list of options run:

@@ -75,7 +75,7 @@ python -m mlx_lm.generate --help
 To quantize a model from the command line run:

 ```
-python -m mlx_lm.convert --hf-path mistralai/Mistral-7B-v0.1 -q 
+python -m mlx_lm.convert --hf-path mistralai/Mistral-7B-v0.1 -q
 ```

 For more options run:
@@ -85,7 +85,7 @@ python -m mlx_lm.convert --help
 ```

 You can upload new models to Hugging Face by specifying `--upload-repo` to
-`convert`. For example, to upload a quantized Mistral-7B model to the 
+`convert`. For example, to upload a quantized Mistral-7B model to the
 [MLX Hugging Face community](https://huggingface.co/mlx-community) you can do:

 ```
@@ -111,6 +111,8 @@ Here are a few examples of Hugging Face models that work with this example:
 - [microsoft/phi-2](https://huggingface.co/microsoft/phi-2)
 - [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
 - [Qwen/Qwen-7B](https://huggingface.co/Qwen/Qwen-7B)
+- [pfnet/plamo-13b](https://huggingface.co/pfnet/plamo-13b)
+- [pfnet/plamo-13b-instruct](https://huggingface.co/pfnet/plamo-13b-instruct)

 Most
 [Mistral](https://huggingface.co/models?library=transformers,safetensors&other=mistral&sort=trending),
@@ -120,12 +122,17 @@ and
 [Mixtral](https://huggingface.co/models?library=transformers,safetensors&other=mixtral&sort=trending)
 style models should work out of the box.

-For
-[Qwen](https://huggingface.co/models?library=transformers,safetensors&other=qwen&sort=trending)
-style models, you must enable the `trust_remote_code` option and specify the
-`eos_token`. This ensures the tokenizer works correctly.  You can do this by
-passing `--trust-remote-code` and `--eos-token "<|endoftext|>"` in the command
-line, or by setting these options in the Python API:
+For some models (such as `Qwen` and `plamo`) the tokenizer requires you to
+enable the `trust_remote_code` option. You can do this by passing
+`--trust-remote-code` in the command line. If you don't specify the flag
+explicitly, you will be prompted to trust remote code in the terminal when
+running the model. 
+
+For `Qwen` models you must also specify the `eos_token`. You can do this by
+passing `--eos-token "<|endoftext|>"` in the command
+line. 
+
+These options can also be set in the Python API. For example:

 ```python
 model, tokenizer = load(