mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-10-24 06:28:07 +08:00
mlx_lm: Add Streaming Capability to Generate Function (#807)
* Add streaming feature to text generation function * separate stream and regular functions --------- Co-authored-by: Awni Hannun <awni@apple.com>
This commit is contained in:
@@ -27,7 +27,7 @@ You can use `mlx-lm` as a module:
|
||||
```python
|
||||
from mlx_lm import load, generate
|
||||
|
||||
model, tokenizer = load("mistralai/Mistral-7B-Instruct-v0.1")
|
||||
model, tokenizer = load("mlx-community/Mistral-7B-Instruct-v0.3-4bit")
|
||||
|
||||
response = generate(model, tokenizer, prompt="hello", verbose=True)
|
||||
```
|
||||
@@ -46,13 +46,14 @@ You can convert models in the Python API with:
|
||||
```python
|
||||
from mlx_lm import convert
|
||||
|
||||
upload_repo = "mlx-community/My-Mistral-7B-v0.1-4bit"
|
||||
repo = "mistralai/Mistral-7B-Instruct-v0.3"
|
||||
upload_repo = "mlx-community/My-Mistral-7B-Instruct-v0.3-4bit"
|
||||
|
||||
convert("mistralai/Mistral-7B-v0.1", quantize=True, upload_repo=upload_repo)
|
||||
convert(repo, quantize=True, upload_repo=upload_repo)
|
||||
```
|
||||
|
||||
This will generate a 4-bit quantized Mistral-7B and upload it to the
|
||||
repo `mlx-community/My-Mistral-7B-v0.1-4bit`. It will also save the
|
||||
This will generate a 4-bit quantized Mistral 7B and upload it to the repo
|
||||
`mlx-community/My-Mistral-7B-Instruct-v0.3-4bit`. It will also save the
|
||||
converted model in the path `mlx_model` by default.
|
||||
|
||||
To see a description of all the arguments you can do:
|
||||
@@ -61,12 +62,30 @@ To see a description of all the arguments you can do:
|
||||
>>> help(convert)
|
||||
```
|
||||
|
||||
#### Streaming
|
||||
|
||||
For streaming generation, use the `stream_generate` function. This returns a
|
||||
generator object which streams the output text. For example,
|
||||
|
||||
```python
|
||||
from mlx_lm import load, stream_generate
|
||||
|
||||
repo = "mlx-community/Mistral-7B-Instruct-v0.3-4bit"
|
||||
model, tokenizer = load(repo)
|
||||
|
||||
prompt = "Write a story about Einstein"
|
||||
|
||||
for t in stream_generate(model, tokenizer, prompt, max_tokens=512):
|
||||
print(t, end="", flush=True)
|
||||
print()
|
||||
```
|
||||
|
||||
### Command Line
|
||||
|
||||
You can also use `mlx-lm` from the command line with:
|
||||
|
||||
```
|
||||
mlx_lm.generate --model mistralai/Mistral-7B-Instruct-v0.1 --prompt "hello"
|
||||
mlx_lm.generate --model mistralai/Mistral-7B-Instruct-v0.3 --prompt "hello"
|
||||
```
|
||||
|
||||
This will download a Mistral 7B model from the Hugging Face Hub and generate
|
||||
@@ -81,7 +100,7 @@ mlx_lm.generate --help
|
||||
To quantize a model from the command line run:
|
||||
|
||||
```
|
||||
mlx_lm.convert --hf-path mistralai/Mistral-7B-Instruct-v0.1 -q
|
||||
mlx_lm.convert --hf-path mistralai/Mistral-7B-Instruct-v0.3 -q
|
||||
```
|
||||
|
||||
For more options run:
|
||||
@@ -96,7 +115,7 @@ You can upload new models to Hugging Face by specifying `--upload-repo` to
|
||||
|
||||
```
|
||||
mlx_lm.convert \
|
||||
--hf-path mistralai/Mistral-7B-v0.1 \
|
||||
--hf-path mistralai/Mistral-7B-Instruct-v0.3 \
|
||||
-q \
|
||||
--upload-repo mlx-community/my-4bit-mistral
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user