refactor sampler/processor and a few improvements

This commit is contained in:
Awni Hannun
2024-11-05 17:01:21 -08:00
parent 3783156072
commit 0be87b3c53
9 changed files with 153 additions and 164 deletions

View File

@@ -101,7 +101,8 @@ To see a description of all the arguments you can do:
#### Streaming
For streaming generation, use the `stream_generate` function. This returns a
generator object which streams the output text. For example,
generator object which streams the output text, token, and log probabilities.
For example,
```python
from mlx_lm import load, stream_generate
@@ -116,7 +117,7 @@ prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
for t in stream_generate(model, tokenizer, prompt, max_tokens=512):
for text, *_ in stream_generate(model, tokenizer, prompt, max_tokens=512):
print(t, end="", flush=True)
print()
```