Create executables for generate, lora, server, merge, convert (#682)

* feat: create executables mlx_lm.<cmd> * nits in docs --------- Co-authored-by: Awni Hannun <awni@apple.com>
2025-12-16 02:08:55 +08:00 · 2024-04-17 00:08:49 +01:00
parent 7d7e236061
commit 35206806ac
10 changed files with 54 additions and 27 deletions
--- a/llms/mlx_lm/LORA.md
+++ b/llms/mlx_lm/LORA.md
@@ -27,7 +27,7 @@ LoRA (QLoRA).[^qlora] LoRA fine-tuning works with the following model families:
 The main command is `mlx_lm.lora`. To see a full list of command-line options run:

 ```shell
-python -m mlx_lm.lora --help
+mlx_lm.lora --help
 ```

 Note, in the following the `--model` argument can be any compatible Hugging
@@ -37,7 +37,7 @@ You can also specify a YAML config with `-c`/`--config`. For more on the format
 [example YAML](examples/lora_config.yaml). For example:

 ```shell
-python -m mlx_lm.lora --config /path/to/config.yaml
+mlx_lm.lora --config /path/to/config.yaml
 ```

 If command-line flags are also used, they will override the corresponding
@@ -48,7 +48,7 @@ values in the config.
 To fine-tune a model use:

 ```shell
-python -m mlx_lm.lora \
+mlx_lm.lora \
    --model <path_to_model> \
    --train \
    --data <path_to_data> \
@@ -76,7 +76,7 @@ You can resume fine-tuning with an existing adapter with
 To compute test set perplexity use:

 ```shell
-python -m mlx_lm.lora \
+mlx_lm.lora \
    --model <path_to_model> \
    --adapter-path <path_to_adapters> \
    --data <path_to_data> \
@@ -88,7 +88,7 @@ python -m mlx_lm.lora \
 For generation use `mlx_lm.generate`:

 ```shell
-python -m mlx_lm.generate \
+mlx_lm.generate \
    --model <path_to_model> \
    --adapter-path <path_to_adapters> \
    --prompt "<your_model_prompt>"
@@ -106,13 +106,13 @@ You can generate a model fused with the low-rank adapters using the
 To see supported options run:

 ```shell
-python -m mlx_lm.fuse --help
+mlx_lm.fuse --help
 ```

 To generate the fused model run:

 ```shell
-python -m mlx_lm.fuse --model <path_to_model>
+mlx_lm.fuse --model <path_to_model>
 ```

 This will by default load the adapters from `adapters/`, and save the fused
@@ -125,7 +125,7 @@ useful for the sake of attribution and model versioning.
 For example, to fuse and upload a model derived from Mistral-7B-v0.1, run: 

 ```shell
-python -m mlx_lm.fuse \
+mlx_lm.fuse \
    --model mistralai/Mistral-7B-v0.1 \
    --upload-repo mlx-community/my-4bit-lora-mistral \
    --hf-path mistralai/Mistral-7B-v0.1
@@ -134,7 +134,7 @@ python -m mlx_lm.fuse \
 To export a fused model to GGUF, run:

 ```shell
-python -m mlx_lm.fuse \
+mlx_lm.fuse \
    --model mistralai/Mistral-7B-v0.1 \
    --export-gguf
 ```