diff --git a/llms/mlx_lm/Makefile b/llms/mlx_lm/Makefile new file mode 100644 index 00000000..c582c66e --- /dev/null +++ b/llms/mlx_lm/Makefile @@ -0,0 +1,11 @@ +run: + mlx_lm.server --model mlx-community/Meta-Llama-3.1-8B-Instruct-8bit --trust-remote-code --port 8722 + +k: + ps -ef|grep 'mlx_lm.server'|awk '{print $2}'|xargs kill -9 + +w: + curl -X GET "http://127.0.0.1:9000/api/ai/WriteBlogRandomlyWithLLM?model=MLXLMServer" -H "Request-Origion:SwaggerBootstrapUi" -H "accept:*/*" + +c: + conda activate m3mlx \ No newline at end of file diff --git a/llms/mlx_lm/README.md b/llms/mlx_lm/README.md index 66f2b5e9..e286084f 100644 --- a/llms/mlx_lm/README.md +++ b/llms/mlx_lm/README.md @@ -8,3 +8,429 @@ parent directory. This package also supports fine tuning with LoRA or QLoRA. For more information see the [LoRA documentation](LORA.md). + + +## Install mlx_lm locally + +```shell +cd llms + +pip install -e . + +--------------------------------------------------------- +Looking in indexes: https://bytedpypi.byted.org/simple/ +Obtaining file:///Users/bytedance/ai/mlx-examples/llms + Preparing metadata (setup.py) ... done +Requirement already satisfied: mlx>=0.14.1 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from mlx-lm==0.16.0) (0.16.0) +Requirement already satisfied: numpy in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from mlx-lm==0.16.0) (1.26.4) +Requirement already satisfied: transformers>=4.39.3 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (4.41.1) +Requirement already satisfied: protobuf in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from mlx-lm==0.16.0) (5.27.0) +Requirement already satisfied: pyyaml in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from mlx-lm==0.16.0) (6.0.1) +Requirement already satisfied: jinja2 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from mlx-lm==0.16.0) (3.1.4) +Requirement already satisfied: filelock in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (3.14.0) +Requirement already satisfied: huggingface-hub<1.0,>=0.23.0 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (0.23.4) +Requirement already satisfied: packaging>=20.0 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (24.0) +Requirement already satisfied: regex!=2019.12.17 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (2024.5.15) +Requirement already satisfied: requests in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (2.32.2) +Requirement already satisfied: tokenizers<0.20,>=0.19 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (0.19.1) +Requirement already satisfied: safetensors>=0.4.1 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (0.4.3) +Requirement already satisfied: tqdm>=4.27 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (4.66.4) +Requirement already satisfied: sentencepiece!=0.1.92,>=0.1.91 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (0.2.0) +Requirement already satisfied: MarkupSafe>=2.0 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from jinja2->mlx-lm==0.16.0) (2.1.5) +Requirement already satisfied: fsspec>=2023.5.0 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from huggingface-hub<1.0,>=0.23.0->transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (2024.5.0) +Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from huggingface-hub<1.0,>=0.23.0->transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (4.12.0) +Requirement already satisfied: charset-normalizer<4,>=2 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from requests->transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (3.3.2) +Requirement already satisfied: idna<4,>=2.5 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from requests->transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (3.7) +Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from requests->transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (2.2.1) +Requirement already satisfied: certifi>=2017.4.17 in /opt/miniconda3/envs/m3mlx/lib/python3.11/site-packages (from requests->transformers>=4.39.3->transformers[sentencepiece]>=4.39.3->mlx-lm==0.16.0) (2024.2.2) +Installing collected packages: mlx-lm + Attempting uninstall: mlx-lm + Found existing installation: mlx-lm 0.15.3 + Uninstalling mlx-lm-0.15.3: + Successfully uninstalled mlx-lm-0.15.3 + Running setup.py develop for mlx-lm +Successfully installed mlx-lm-0.16.0 + +``` + +## Run MXL LLM Server + +```shell +cd llms/mlx_lm +``` + +Start the server with: + +> see: [SERVER.md](llms%2Fmlx_lm%2FSERVER.md) + +```shell +mlx_lm.server --model +``` + +For example: + +```shell +mlx_lm.server --model mlx-community/Meta-Llama-3.1-8B-Instruct-8bit --trust-remote-code --port 8722 +mlx_lm.server --model mlx-community/Mistral-7B-Instruct-v0.3-4bit --trust-remote-code --port 8722 +mlx_lm.server --model mlx-community/internlm2_5-7b-chat-8bit --trust-remote-code --port 8722 +``` + +This will start a text generation server on port `8080` of the `localhost` +using Mistral 7B instruct. The model will be downloaded from the provided +Hugging Face repo if it is not already in the local cache. + +To see a full list of options run: + +```shell +mlx_lm.server --help +``` + +You can make a request to the model by running: + +```shell +curl localhost:8722/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "messages": [{"role": "user", "content": "Say this is a test!"}], + "temperature": 0.7, + "max_tokens": 100, + }' +``` + +output: + +```json +{ + "id": "chatcmpl-74e66064-8727-411a-ada3-d5287b2c83a2", + "system_fingerprint": "fp_73a731bd-bd00-4dcd-8fac-8f3f452210a2", + "object": "chat.completions", + "model": "default_model", + "created": 1721634359, + "choices": [ + { + "index": 0, + "logprobs": { + "token_logprobs": [ + -2.4453125, + -1.28125, + -1.421875, + -0.25, + -7.53125, + -1.15625, + -4.09375, + -0.390625, + -3.0625, + -0.84375, + -2.53125, + -0.125, + -0.40625, + -0.015625, + -0.15625, + -0.265625, + -1.015625, + -1.6484375, + -1.0625, + -0.40625, + -4.390625, + -0.296875, + -1.078125, + -3.0625, + -0.328125, + -0.21875, + -0.390625, + -2.015625, + -3.46875, + 0.0, + -0.765625, + -2.609375, + -1.921875, + -1.078125, + -1.859375, + -1.625, + -0.09375, + -0.015625, + -1.5625, + -2.1015625, + -1.65625, + -0.21875, + 0.0, + 0.0, + -1.640625, + -0.0625, + 0.0, + -1.234375, + -0.6875, + -0.53125, + -0.078125, + -0.03125, + -1.015625, + -0.109375, + -3.4765625, + -0.015625, + -2.140625, + -1.34375, + -1.0625, + -2.21875, + -1.046875, + -0.046875, + -0.375, + -1.0, + -1.0625, + -3.21875, + -0.5, + -0.234375, + -0.15625, + -2.015625, + -1.265625, + -0.390625, + -2.265625, + -0.0625, + -1.59375, + -3.5625, + -0.59375, + -0.46875, + -1.0, + -1.3515625, + -0.296875, + -1.4375, + 0.0, + -1.1875, + -0.46875, + -0.15625, + -0.375, + -0.0625, + -0.0625, + -3.90625, + -0.9375, + -0.5625, + -0.25, + -2.53125, + -0.28125, + -2.640625, + -0.59375, + -0.75, + -0.53125, + -0.71875 + ], + "top_logprobs": [], + "tokens": [ + 39584, + 346, + 5846, + 725, + 3716, + 489, + 4330, + 25341, + 16375, + 3103, + 1226, + 725, + 395, + 3556, + 1593, + 43916, + 465, + 2423, + 57436, + 334, + 19109, + 446, + 395, + 16375, + 22006, + 55098, + 465, + 53057, + 51040, + 334, + 465, + 848, + 285, + 3235, + 53057, + 4144, + 334, + 465, + 461, + 2423, + 57436, + 830, + 285, + 3235, + 5168, + 334, + 465, + 461, + 2136, + 505, + 395, + 1420, + 17338, + 465, + 312, + 281, + 5128, + 285, + 2423, + 5128, + 1883, + 938, + 334, + 55098, + 10363, + 6069, + 410, + 1420, + 328, + 410, + 2863, + 46301, + 2119, + 517, + 2014, + 334, + 4872, + 285, + 3235, + 2423, + 11740, + 334, + 465, + 29581, + 560, + 410, + 1420, + 4736, + 505, + 6662, + 12590, + 281, + 1239, + 1377, + 3089, + 22865, + 560, + 810, + 6025, + 3328 + ] + }, + "finish_reason": "length", + "message": { + "role": "assistant", + "content": "Sure! Here's What I'd Say Given That It's a Test:\n\n---\n\n**Test Scenario: Validation of a Given Statement**\n\n**Scenario Outline:** \n- **Scenario Name:** \"Test Scenario\"\n- **Description:** \"This is a test!\"\n\n**1. Pre-Test Preparations:**\n\nBefore starting the test, the following preparations must be made: \n\n- **Test Environment:** Ensure that the test environment is setup correctly. This may include ensuring that all necessary software" + } + } + ], + "usage": { + "prompt_tokens": 16, + "completion_tokens": 100, + "total_tokens": 116 + } +} + +``` + +### Request Fields + +- `messages`: An array of message objects representing the conversation + history. Each message object should have a role (e.g. user, assistant) and + content (the message text). + +- `role_mapping`: (Optional) A dictionary to customize the role prefixes in + the generated prompt. If not provided, the default mappings are used. + +- `stop`: (Optional) An array of strings or a single string. Thesse are + sequences of tokens on which the generation should stop. + +- `max_tokens`: (Optional) An integer specifying the maximum number of tokens + to generate. Defaults to `100`. + +- `stream`: (Optional) A boolean indicating if the response should be + streamed. If true, responses are sent as they are generated. Defaults to + false. + +- `temperature`: (Optional) A float specifying the sampling temperature. + Defaults to `1.0`. + +- `top_p`: (Optional) A float specifying the nucleus sampling parameter. + Defaults to `1.0`. + +- `repetition_penalty`: (Optional) Applies a penalty to repeated tokens. + Defaults to `1.0`. + +- `repetition_context_size`: (Optional) The size of the context window for + applying repetition penalty. Defaults to `20`. + +- `logit_bias`: (Optional) A dictionary mapping token IDs to their bias + values. Defaults to `None`. + +- `logprobs`: (Optional) An integer specifying the number of top tokens and + corresponding log probabilities to return for each output in the generated + sequence. If set, this can be any value between 1 and 10, inclusive. + + + +### Text Models + +- [MLX LM](llms/README.md) a package for LLM text generation, fine-tuning, and more. +- [Transformer language model](transformer_lm) training. +- Minimal examples of large scale text generation with [LLaMA](llms/llama), + [Mistral](llms/mistral), and more in the [LLMs](llms) directory. +- A mixture-of-experts (MoE) language model with [Mixtral 8x7B](llms/mixtral). +- Parameter efficient fine-tuning with [LoRA or QLoRA](lora). +- Text-to-text multi-task Transformers with [T5](t5). +- Bidirectional language understanding with [BERT](bert). + +### Image Models + +- Image classification using [ResNets on CIFAR-10](cifar). +- Generating images with [Stable Diffusion or SDXL](stable_diffusion). +- Convolutional variational autoencoder [(CVAE) on MNIST](cvae). + +### Audio Models + +- Speech recognition with [OpenAI's Whisper](whisper). + +### Multimodal models + +- Joint text and image embeddings with [CLIP](clip). +- Text generation from image and text inputs with [LLaVA](llava). + +### Other Models + +- Semi-supervised learning on graph-structured data with [GCN](gcn). +- Real NVP [normalizing flow](normalizing_flow) for density estimation and + sampling. + +### Hugging Face + +Note: You can now directly download a few converted checkpoints from the [MLX +Community](https://huggingface.co/mlx-community) organization on Hugging Face. +We encourage you to join the community and [contribute new +models](https://github.com/ml-explore/mlx-examples/issues/155). + +## Contributing + +We are grateful for all of [our +contributors](ACKNOWLEDGMENTS.md#Individual-Contributors). If you contribute +to MLX Examples and wish to be acknowledged, please add your name to the list in your +pull request. + +## Citing MLX Examples + +The MLX software suite was initially developed with equal contribution by Awni +Hannun, Jagrit Digani, Angelos Katharopoulos, and Ronan Collobert. If you find +MLX Examples useful in your research and wish to cite it, please use the following +BibTex entry: + +``` +@software{mlx2023, + author = {Awni Hannun and Jagrit Digani and Angelos Katharopoulos and Ronan Collobert}, + title = {{MLX}: Efficient and flexible machine learning on Apple silicon}, + url = {https://github.com/ml-explore}, + version = {0.0}, + year = {2023}, +} +```