mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-08-30 02:53:41 +08:00
mlx_lm.server --model mlx-community/Meta-Llama-3.1-8B-Instruct-8bit --trust-remote-code --port 8722
This commit is contained in:
parent
cd8efc7fbc
commit
8e3f04f66c
11
llms/mlx_lm/Makefile
Normal file
11
llms/mlx_lm/Makefile
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
run:
|
||||||
|
mlx_lm.server --model mlx-community/Meta-Llama-3.1-8B-Instruct-8bit --trust-remote-code --port 8722
|
||||||
|
|
||||||
|
k:
|
||||||
|
ps -ef|grep 'mlx_lm.server'|awk '{print $2}'|xargs kill -9
|
||||||
|
|
||||||
|
w:
|
||||||
|
curl -X GET "http://127.0.0.1:9000/api/ai/WriteBlogRandomlyWithLLM?model=MLXLMServer" -H "Request-Origion:SwaggerBootstrapUi" -H "accept:*/*"
|
||||||
|
|
||||||
|
c:
|
||||||
|
conda activate m3mlx
|
@ -8,3 +8,429 @@ parent directory.
|
|||||||
|
|
||||||
This package also supports fine tuning with LoRA or QLoRA. For more information
|
This package also supports fine tuning with LoRA or QLoRA. For more information
|
||||||
see the [LoRA documentation](LORA.md).
|
see the [LoRA documentation](LORA.md).
|
||||||
|
|
||||||
|
|
||||||
|
## Install mlx_lm locally
|
||||||
|
|
||||||
|
```shell
|
||||||
|
cd llms
|
||||||
|
|
||||||
|
pip install -e .
|
||||||
|
|
||||||
|
---------------------------------------------------------
|
||||||
|
Looking in indexes: https://bytedpypi.byted.org/simple/
|
||||||
|
Obtaining file:///Users/bytedance/ai/mlx-examples/llms
|
||||||
|
Preparing metadata (setup.py) ... done
|
||||||
|
Requirement already satisfied: mlx>=0.14.1 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from mlx-lm==0.16.0) (0.16.0)
|
||||||
|
Requirement already satisfied: numpy in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from mlx-lm==0.16.0) (1.26.4)
|
||||||
|
Requirement already satisfied: transformers>=4.39.3 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (4.41.1)
|
||||||
|
Requirement already satisfied: protobuf in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from mlx-lm==0.16.0) (5.27.0)
|
||||||
|
Requirement already satisfied: pyyaml in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from mlx-lm==0.16.0) (6.0.1)
|
||||||
|
Requirement already satisfied: jinja2 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from mlx-lm==0.16.0) (3.1.4)
|
||||||
|
Requirement already satisfied: filelock in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (3.14.0)
|
||||||
|
Requirement already satisfied: huggingface-hub<1.0,>=0.23.0 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (0.23.4)
|
||||||
|
Requirement already satisfied: packaging>=20.0 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (24.0)
|
||||||
|
Requirement already satisfied: regex!=2019.12.17 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (2024.5.15)
|
||||||
|
Requirement already satisfied: requests in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (2.32.2)
|
||||||
|
Requirement already satisfied: tokenizers<0.20,>=0.19 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (0.19.1)
|
||||||
|
Requirement already satisfied: safetensors>=0.4.1 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (0.4.3)
|
||||||
|
Requirement already satisfied: tqdm>=4.27 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (4.66.4)
|
||||||
|
Requirement already satisfied: sentencepiece!=0.1.92,>=0.1.91 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (0.2.0)
|
||||||
|
Requirement already satisfied: MarkupSafe>=2.0 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from jinja2->mlx-lm==0.16.0) (2.1.5)
|
||||||
|
Requirement already satisfied: fsspec>=2023.5.0 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from huggingface-hub<1.0,>=0.23.0->transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (2024.5.0)
|
||||||
|
Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from huggingface-hub<1.0,>=0.23.0->transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (4.12.0)
|
||||||
|
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from requests->transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (3.3.2)
|
||||||
|
Requirement already satisfied: idna<4,>=2.5 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from requests->transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (3.7)
|
||||||
|
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from requests->transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (2.2.1)
|
||||||
|
Requirement already satisfied: certifi>=2017.4.17 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from requests->transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (2024.2.2)
|
||||||
|
Installing collected packages: mlx-lm
|
||||||
|
Attempting uninstall: mlx-lm
|
||||||
|
Found existing installation: mlx-lm 0.15.3
|
||||||
|
Uninstalling mlx-lm-0.15.3:
|
||||||
|
Successfully uninstalled mlx-lm-0.15.3
|
||||||
|
Running setup.py develop for mlx-lm
|
||||||
|
Successfully installed mlx-lm-0.16.0
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
## Run MXL LLM Server
|
||||||
|
|
||||||
|
```shell
|
||||||
|
cd llms/mlx_lm
|
||||||
|
```
|
||||||
|
|
||||||
|
Start the server with:
|
||||||
|
|
||||||
|
> see: [SERVER.md](llms%2Fmlx_lm%2FSERVER.md)
|
||||||
|
|
||||||
|
```shell
|
||||||
|
mlx_lm.server --model <path_to_model_or_hf_repo>
|
||||||
|
```
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
mlx_lm.server --model mlx-community/Meta-Llama-3.1-8B-Instruct-8bit --trust-remote-code --port 8722
|
||||||
|
mlx_lm.server --model mlx-community/Mistral-7B-Instruct-v0.3-4bit --trust-remote-code --port 8722
|
||||||
|
mlx_lm.server --model mlx-community/internlm2_5-7b-chat-8bit --trust-remote-code --port 8722
|
||||||
|
```
|
||||||
|
|
||||||
|
This will start a text generation server on port `8080` of the `localhost`
|
||||||
|
using Mistral 7B instruct. The model will be downloaded from the provided
|
||||||
|
Hugging Face repo if it is not already in the local cache.
|
||||||
|
|
||||||
|
To see a full list of options run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
mlx_lm.server --help
|
||||||
|
```
|
||||||
|
|
||||||
|
You can make a request to the model by running:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
curl localhost:8722/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"messages": [{"role": "user", "content": "Say this is a test!"}],
|
||||||
|
"temperature": 0.7,
|
||||||
|
"max_tokens": 100,
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
output:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": "chatcmpl-74e66064-8727-411a-ada3-d5287b2c83a2",
|
||||||
|
"system_fingerprint": "fp_73a731bd-bd00-4dcd-8fac-8f3f452210a2",
|
||||||
|
"object": "chat.completions",
|
||||||
|
"model": "default_model",
|
||||||
|
"created": 1721634359,
|
||||||
|
"choices": [
|
||||||
|
{
|
||||||
|
"index": 0,
|
||||||
|
"logprobs": {
|
||||||
|
"token_logprobs": [
|
||||||
|
-2.4453125,
|
||||||
|
-1.28125,
|
||||||
|
-1.421875,
|
||||||
|
-0.25,
|
||||||
|
-7.53125,
|
||||||
|
-1.15625,
|
||||||
|
-4.09375,
|
||||||
|
-0.390625,
|
||||||
|
-3.0625,
|
||||||
|
-0.84375,
|
||||||
|
-2.53125,
|
||||||
|
-0.125,
|
||||||
|
-0.40625,
|
||||||
|
-0.015625,
|
||||||
|
-0.15625,
|
||||||
|
-0.265625,
|
||||||
|
-1.015625,
|
||||||
|
-1.6484375,
|
||||||
|
-1.0625,
|
||||||
|
-0.40625,
|
||||||
|
-4.390625,
|
||||||
|
-0.296875,
|
||||||
|
-1.078125,
|
||||||
|
-3.0625,
|
||||||
|
-0.328125,
|
||||||
|
-0.21875,
|
||||||
|
-0.390625,
|
||||||
|
-2.015625,
|
||||||
|
-3.46875,
|
||||||
|
0.0,
|
||||||
|
-0.765625,
|
||||||
|
-2.609375,
|
||||||
|
-1.921875,
|
||||||
|
-1.078125,
|
||||||
|
-1.859375,
|
||||||
|
-1.625,
|
||||||
|
-0.09375,
|
||||||
|
-0.015625,
|
||||||
|
-1.5625,
|
||||||
|
-2.1015625,
|
||||||
|
-1.65625,
|
||||||
|
-0.21875,
|
||||||
|
0.0,
|
||||||
|
0.0,
|
||||||
|
-1.640625,
|
||||||
|
-0.0625,
|
||||||
|
0.0,
|
||||||
|
-1.234375,
|
||||||
|
-0.6875,
|
||||||
|
-0.53125,
|
||||||
|
-0.078125,
|
||||||
|
-0.03125,
|
||||||
|
-1.015625,
|
||||||
|
-0.109375,
|
||||||
|
-3.4765625,
|
||||||
|
-0.015625,
|
||||||
|
-2.140625,
|
||||||
|
-1.34375,
|
||||||
|
-1.0625,
|
||||||
|
-2.21875,
|
||||||
|
-1.046875,
|
||||||
|
-0.046875,
|
||||||
|
-0.375,
|
||||||
|
-1.0,
|
||||||
|
-1.0625,
|
||||||
|
-3.21875,
|
||||||
|
-0.5,
|
||||||
|
-0.234375,
|
||||||
|
-0.15625,
|
||||||
|
-2.015625,
|
||||||
|
-1.265625,
|
||||||
|
-0.390625,
|
||||||
|
-2.265625,
|
||||||
|
-0.0625,
|
||||||
|
-1.59375,
|
||||||
|
-3.5625,
|
||||||
|
-0.59375,
|
||||||
|
-0.46875,
|
||||||
|
-1.0,
|
||||||
|
-1.3515625,
|
||||||
|
-0.296875,
|
||||||
|
-1.4375,
|
||||||
|
0.0,
|
||||||
|
-1.1875,
|
||||||
|
-0.46875,
|
||||||
|
-0.15625,
|
||||||
|
-0.375,
|
||||||
|
-0.0625,
|
||||||
|
-0.0625,
|
||||||
|
-3.90625,
|
||||||
|
-0.9375,
|
||||||
|
-0.5625,
|
||||||
|
-0.25,
|
||||||
|
-2.53125,
|
||||||
|
-0.28125,
|
||||||
|
-2.640625,
|
||||||
|
-0.59375,
|
||||||
|
-0.75,
|
||||||
|
-0.53125,
|
||||||
|
-0.71875
|
||||||
|
],
|
||||||
|
"top_logprobs": [],
|
||||||
|
"tokens": [
|
||||||
|
39584,
|
||||||
|
346,
|
||||||
|
5846,
|
||||||
|
725,
|
||||||
|
3716,
|
||||||
|
489,
|
||||||
|
4330,
|
||||||
|
25341,
|
||||||
|
16375,
|
||||||
|
3103,
|
||||||
|
1226,
|
||||||
|
725,
|
||||||
|
395,
|
||||||
|
3556,
|
||||||
|
1593,
|
||||||
|
43916,
|
||||||
|
465,
|
||||||
|
2423,
|
||||||
|
57436,
|
||||||
|
334,
|
||||||
|
19109,
|
||||||
|
446,
|
||||||
|
395,
|
||||||
|
16375,
|
||||||
|
22006,
|
||||||
|
55098,
|
||||||
|
465,
|
||||||
|
53057,
|
||||||
|
51040,
|
||||||
|
334,
|
||||||
|
465,
|
||||||
|
848,
|
||||||
|
285,
|
||||||
|
3235,
|
||||||
|
53057,
|
||||||
|
4144,
|
||||||
|
334,
|
||||||
|
465,
|
||||||
|
461,
|
||||||
|
2423,
|
||||||
|
57436,
|
||||||
|
830,
|
||||||
|
285,
|
||||||
|
3235,
|
||||||
|
5168,
|
||||||
|
334,
|
||||||
|
465,
|
||||||
|
461,
|
||||||
|
2136,
|
||||||
|
505,
|
||||||
|
395,
|
||||||
|
1420,
|
||||||
|
17338,
|
||||||
|
465,
|
||||||
|
312,
|
||||||
|
281,
|
||||||
|
5128,
|
||||||
|
285,
|
||||||
|
2423,
|
||||||
|
5128,
|
||||||
|
1883,
|
||||||
|
938,
|
||||||
|
334,
|
||||||
|
55098,
|
||||||
|
10363,
|
||||||
|
6069,
|
||||||
|
410,
|
||||||
|
1420,
|
||||||
|
328,
|
||||||
|
410,
|
||||||
|
2863,
|
||||||
|
46301,
|
||||||
|
2119,
|
||||||
|
517,
|
||||||
|
2014,
|
||||||
|
334,
|
||||||
|
4872,
|
||||||
|
285,
|
||||||
|
3235,
|
||||||
|
2423,
|
||||||
|
11740,
|
||||||
|
334,
|
||||||
|
465,
|
||||||
|
29581,
|
||||||
|
560,
|
||||||
|
410,
|
||||||
|
1420,
|
||||||
|
4736,
|
||||||
|
505,
|
||||||
|
6662,
|
||||||
|
12590,
|
||||||
|
281,
|
||||||
|
1239,
|
||||||
|
1377,
|
||||||
|
3089,
|
||||||
|
22865,
|
||||||
|
560,
|
||||||
|
810,
|
||||||
|
6025,
|
||||||
|
3328
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"finish_reason": "length",
|
||||||
|
"message": {
|
||||||
|
"role": "assistant",
|
||||||
|
"content": "Sure! Here's What I'd Say Given That It's a Test:\n\n---\n\n**Test Scenario: Validation of a Given Statement**\n\n**Scenario Outline:** \n- **Scenario Name:** \"Test Scenario\"\n- **Description:** \"This is a test!\"\n\n**1. Pre-Test Preparations:**\n\nBefore starting the test, the following preparations must be made: \n\n- **Test Environment:** Ensure that the test environment is setup correctly. This may include ensuring that all necessary software"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"usage": {
|
||||||
|
"prompt_tokens": 16,
|
||||||
|
"completion_tokens": 100,
|
||||||
|
"total_tokens": 116
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
### Request Fields
|
||||||
|
|
||||||
|
- `messages`: An array of message objects representing the conversation
|
||||||
|
history. Each message object should have a role (e.g. user, assistant) and
|
||||||
|
content (the message text).
|
||||||
|
|
||||||
|
- `role_mapping`: (Optional) A dictionary to customize the role prefixes in
|
||||||
|
the generated prompt. If not provided, the default mappings are used.
|
||||||
|
|
||||||
|
- `stop`: (Optional) An array of strings or a single string. Thesse are
|
||||||
|
sequences of tokens on which the generation should stop.
|
||||||
|
|
||||||
|
- `max_tokens`: (Optional) An integer specifying the maximum number of tokens
|
||||||
|
to generate. Defaults to `100`.
|
||||||
|
|
||||||
|
- `stream`: (Optional) A boolean indicating if the response should be
|
||||||
|
streamed. If true, responses are sent as they are generated. Defaults to
|
||||||
|
false.
|
||||||
|
|
||||||
|
- `temperature`: (Optional) A float specifying the sampling temperature.
|
||||||
|
Defaults to `1.0`.
|
||||||
|
|
||||||
|
- `top_p`: (Optional) A float specifying the nucleus sampling parameter.
|
||||||
|
Defaults to `1.0`.
|
||||||
|
|
||||||
|
- `repetition_penalty`: (Optional) Applies a penalty to repeated tokens.
|
||||||
|
Defaults to `1.0`.
|
||||||
|
|
||||||
|
- `repetition_context_size`: (Optional) The size of the context window for
|
||||||
|
applying repetition penalty. Defaults to `20`.
|
||||||
|
|
||||||
|
- `logit_bias`: (Optional) A dictionary mapping token IDs to their bias
|
||||||
|
values. Defaults to `None`.
|
||||||
|
|
||||||
|
- `logprobs`: (Optional) An integer specifying the number of top tokens and
|
||||||
|
corresponding log probabilities to return for each output in the generated
|
||||||
|
sequence. If set, this can be any value between 1 and 10, inclusive.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
### Text Models
|
||||||
|
|
||||||
|
- [MLX LM](llms/README.md) a package for LLM text generation, fine-tuning, and more.
|
||||||
|
- [Transformer language model](transformer_lm) training.
|
||||||
|
- Minimal examples of large scale text generation with [LLaMA](llms/llama),
|
||||||
|
[Mistral](llms/mistral), and more in the [LLMs](llms) directory.
|
||||||
|
- A mixture-of-experts (MoE) language model with [Mixtral 8x7B](llms/mixtral).
|
||||||
|
- Parameter efficient fine-tuning with [LoRA or QLoRA](lora).
|
||||||
|
- Text-to-text multi-task Transformers with [T5](t5).
|
||||||
|
- Bidirectional language understanding with [BERT](bert).
|
||||||
|
|
||||||
|
### Image Models
|
||||||
|
|
||||||
|
- Image classification using [ResNets on CIFAR-10](cifar).
|
||||||
|
- Generating images with [Stable Diffusion or SDXL](stable_diffusion).
|
||||||
|
- Convolutional variational autoencoder [(CVAE) on MNIST](cvae).
|
||||||
|
|
||||||
|
### Audio Models
|
||||||
|
|
||||||
|
- Speech recognition with [OpenAI's Whisper](whisper).
|
||||||
|
|
||||||
|
### Multimodal models
|
||||||
|
|
||||||
|
- Joint text and image embeddings with [CLIP](clip).
|
||||||
|
- Text generation from image and text inputs with [LLaVA](llava).
|
||||||
|
|
||||||
|
### Other Models
|
||||||
|
|
||||||
|
- Semi-supervised learning on graph-structured data with [GCN](gcn).
|
||||||
|
- Real NVP [normalizing flow](normalizing_flow) for density estimation and
|
||||||
|
sampling.
|
||||||
|
|
||||||
|
### Hugging Face
|
||||||
|
|
||||||
|
Note: You can now directly download a few converted checkpoints from the [MLX
|
||||||
|
Community](https://huggingface.co/mlx-community) organization on Hugging Face.
|
||||||
|
We encourage you to join the community and [contribute new
|
||||||
|
models](https://github.com/ml-explore/mlx-examples/issues/155).
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
|
||||||
|
We are grateful for all of [our
|
||||||
|
contributors](ACKNOWLEDGMENTS.md#Individual-Contributors). If you contribute
|
||||||
|
to MLX Examples and wish to be acknowledged, please add your name to the list in your
|
||||||
|
pull request.
|
||||||
|
|
||||||
|
## Citing MLX Examples
|
||||||
|
|
||||||
|
The MLX software suite was initially developed with equal contribution by Awni
|
||||||
|
Hannun, Jagrit Digani, Angelos Katharopoulos, and Ronan Collobert. If you find
|
||||||
|
MLX Examples useful in your research and wish to cite it, please use the following
|
||||||
|
BibTex entry:
|
||||||
|
|
||||||
|
```
|
||||||
|
@software{mlx2023,
|
||||||
|
author = {Awni Hannun and Jagrit Digani and Angelos Katharopoulos and Ronan Collobert},
|
||||||
|
title = {{MLX}: Efficient and flexible machine learning on Apple silicon},
|
||||||
|
url = {https://github.com/ml-explore},
|
||||||
|
version = {0.0},
|
||||||
|
year = {2023},
|
||||||
|
}
|
||||||
|
```
|
||||||
|
Loading…
Reference in New Issue
Block a user