[MLX LM] Sampler refactor + a few improvements (#1094)

* starting

* refactor sampler/processor and a few improvements

* fix stream

* fix stream generate

* fix eos handling in stream generate
This commit is contained in:
Awni Hannun
2024-11-07 16:15:24 -08:00
committed by GitHub
parent ed9e81dd58
commit 657b4cc0aa
10 changed files with 259 additions and 239 deletions

View File

@@ -101,7 +101,8 @@ To see a description of all the arguments you can do:
#### Streaming
For streaming generation, use the `stream_generate` function. This returns a
generator object which streams the output text. For example,
generator object which streams the output text, token, and log probabilities.
For example,
```python
from mlx_lm import load, stream_generate
@@ -116,7 +117,7 @@ prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
for t in stream_generate(model, tokenizer, prompt, max_tokens=512):
for text, *_ in stream_generate(model, tokenizer, prompt, max_tokens=512):
print(t, end="", flush=True)
print()
```