mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-09-01 12:49:50 +08:00
Add PLaMo-13B model as an LLM example (#303)
* Convert HF weights of PLaMo and load it to a plamo model in mlx * Fix model inference part * Add bos at the beginning of the prompt * Fix convert.py to copy tokenizer.model into the converted dir * Use the required insturction format in generate.py when "--instruct" option is specified * Change filenames and update existing scripts * Add README * Add requirements.txt * Fix plamo.py to stop generation when EOS appears * Add quantization to convert.py * Use mlx>=0.0.9 for mx.core.outer() in PLaMo model * Update acknowledgements.md * Fix card text in upload_to_hub() * Not use prompt template when --instruct is not specified * Ask if you trust_remote_code for loading tokenizer of PLaMo * Check the user trusts the remote code when converting * Remove plamo directory * Update README * Add PLaMo model file * Fix the handling of cache in PLaMo and update README * Ask if trust_remote_code only when the model is PLaMo * Remove resolve_trust_remote_code from convert.py and use the latest transformers * Remove code not to add EOS * Update README to fix an example not to use noncommercial version of the model * Remove unused imports * Remove unnecessary description about the instruct model of PLaMo from README * format, nits in README * typo --------- Co-authored-by: Shunta Saito <shunta@mitmul-mbp.local> Co-authored-by: Awni Hannun <awni@apple.com>
This commit is contained in:
@@ -38,7 +38,7 @@ upload models to the Hugging Face Hub.
|
||||
You can convert models in the Python API with:
|
||||
|
||||
```python
|
||||
from mlx_lm import convert
|
||||
from mlx_lm import convert
|
||||
|
||||
upload_repo = "mlx-community/My-Mistral-7B-v0.1-4bit"
|
||||
|
||||
@@ -55,7 +55,7 @@ To see a description of all the arguments you can do:
|
||||
>>> help(convert)
|
||||
```
|
||||
|
||||
### Command Line
|
||||
### Command Line
|
||||
|
||||
You can also use `mlx-lm` from the command line with:
|
||||
|
||||
@@ -64,7 +64,7 @@ python -m mlx_lm.generate --model mistralai/Mistral-7B-v0.1 --prompt "hello"
|
||||
```
|
||||
|
||||
This will download a Mistral 7B model from the Hugging Face Hub and generate
|
||||
text using the given prompt.
|
||||
text using the given prompt.
|
||||
|
||||
For a full list of options run:
|
||||
|
||||
@@ -75,7 +75,7 @@ python -m mlx_lm.generate --help
|
||||
To quantize a model from the command line run:
|
||||
|
||||
```
|
||||
python -m mlx_lm.convert --hf-path mistralai/Mistral-7B-v0.1 -q
|
||||
python -m mlx_lm.convert --hf-path mistralai/Mistral-7B-v0.1 -q
|
||||
```
|
||||
|
||||
For more options run:
|
||||
@@ -85,7 +85,7 @@ python -m mlx_lm.convert --help
|
||||
```
|
||||
|
||||
You can upload new models to Hugging Face by specifying `--upload-repo` to
|
||||
`convert`. For example, to upload a quantized Mistral-7B model to the
|
||||
`convert`. For example, to upload a quantized Mistral-7B model to the
|
||||
[MLX Hugging Face community](https://huggingface.co/mlx-community) you can do:
|
||||
|
||||
```
|
||||
@@ -111,6 +111,8 @@ Here are a few examples of Hugging Face models that work with this example:
|
||||
- [microsoft/phi-2](https://huggingface.co/microsoft/phi-2)
|
||||
- [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
|
||||
- [Qwen/Qwen-7B](https://huggingface.co/Qwen/Qwen-7B)
|
||||
- [pfnet/plamo-13b](https://huggingface.co/pfnet/plamo-13b)
|
||||
- [pfnet/plamo-13b-instruct](https://huggingface.co/pfnet/plamo-13b-instruct)
|
||||
|
||||
Most
|
||||
[Mistral](https://huggingface.co/models?library=transformers,safetensors&other=mistral&sort=trending),
|
||||
@@ -120,12 +122,17 @@ and
|
||||
[Mixtral](https://huggingface.co/models?library=transformers,safetensors&other=mixtral&sort=trending)
|
||||
style models should work out of the box.
|
||||
|
||||
For
|
||||
[Qwen](https://huggingface.co/models?library=transformers,safetensors&other=qwen&sort=trending)
|
||||
style models, you must enable the `trust_remote_code` option and specify the
|
||||
`eos_token`. This ensures the tokenizer works correctly. You can do this by
|
||||
passing `--trust-remote-code` and `--eos-token "<|endoftext|>"` in the command
|
||||
line, or by setting these options in the Python API:
|
||||
For some models (such as `Qwen` and `plamo`) the tokenizer requires you to
|
||||
enable the `trust_remote_code` option. You can do this by passing
|
||||
`--trust-remote-code` in the command line. If you don't specify the flag
|
||||
explicitly, you will be prompted to trust remote code in the terminal when
|
||||
running the model.
|
||||
|
||||
For `Qwen` models you must also specify the `eos_token`. You can do this by
|
||||
passing `--eos-token "<|endoftext|>"` in the command
|
||||
line.
|
||||
|
||||
These options can also be set in the Python API. For example:
|
||||
|
||||
```python
|
||||
model, tokenizer = load(
|
||||
|
Reference in New Issue
Block a user