Mlx llm package (#301)

* fix converter * add recursive files * remove gitignore * remove gitignore * add packages properly * read me update * remove dup readme * relative * fix convert * fix community name * fix url * version
2025-12-16 02:08:55 +08:00 · 2024-01-12 10:25:56 -08:00
parent 2b61d9deb6
commit c6440416a2
17 changed files with 270 additions and 388 deletions
--- a/llms/README.md
+++ b/llms/README.md
@@ -0,0 +1,110 @@
+## Generate Text with LLMs and MLX
+
+The easiest way to get started is to install the `mlx-lm` package:
+
+```shell
+pip install mlx-lm
+```
+
+### Python API
+
+You can use `mlx-lm` as a module:
+
+```python
+from mlx_lm import load, generate
+
+model, tokenizer = load("mistralai/Mistral-7B-v0.1")
+
+response = generate(model, tokenizer, prompt="hello", verbose=True)
+```
+
+To see a description of all the arguments you can do:
+
+```
+>>> help(generate)
+```
+
+The `mlx-lm` package also comes with functionality to quantize and optionally
+upload models to the Hugging Face Hub.
+
+You can convert models in the Python API with:
+
+```python
+from mlx_lm import convert 
+
+upload_repo = "mlx-community/My-Mistral-7B-v0.1-4bit"
+
+convert("mistralai/Mistral-7B-v0.1", quantize=True, upload_repo=upload_repo)
+```
+
+This will generate a 4-bit quantized Mistral-7B and upload it to the
+repo `mlx-community/My-Mistral-7B-v0.1-4bit`. It will also save the
+converted model in the path `mlx_model` by default.
+
+To see a description of all the arguments you can do:
+
+```
+>>> help(convert)
+```
+
+### Command Line 
+
+You can also use `mlx-lm` from the command line with:
+
+```
+python -m mlx_lm.generate --model mistralai/Mistral-7B-v0.1 --prompt "hello"
+```
+
+This will download a Mistral 7B model from the Hugging Face Hub and generate
+text using the given prompt. 
+
+For a full list of options run:
+
+```
+python -m mlx_lm generate --help
+```
+
+To quantize a model from the command line run:
+
+```
+python -m mlx_lm.convert --hf-path mistralai/Mistral-7B-v0.1 -q 
+```
+
+For more options run:
+
+```
+python -m mlx_lm.convert --help
+```
+
+You can upload new models to Hugging Face by specifying `--upload-repo` to
+`convert`. For example, to upload a quantized Mistral-7B model to the 
+[MLX Hugging Face community](https://huggingface.co/mlx-community) you can do:
+
+```
+python -m mlx_lm.convert \
+    --hf-path mistralai/Mistral-7B-v0.1 \
+    -q \
+    --upload-repo mlx-community/my-4bit-mistral \
+```
+
+### Supported Models
+
+The example supports Hugging Face format Mistral, Llama, and Phi-2 style
+models.  If the model you want to run is not supported, file an
+[issue](https://github.com/ml-explore/mlx-examples/issues/new) or better yet,
+submit a pull request.
+
+Here are a few examples of Hugging Face models that work with this example:
+
+- [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
+- [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)
+- [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)
+- [01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)
+- [microsoft/phi-2](https://huggingface.co/microsoft/phi-2)
+
+Most
+[Mistral](https://huggingface.co/models?library=transformers,safetensors&other=mistral&sort=trending),
+[Llama](https://huggingface.co/models?library=transformers,safetensors&other=llama&sort=trending),
+and
+[Phi-2](https://huggingface.co/models?library=transformers,safetensors&other=phi&sort=trending)
+style models should work out of the box.