refactor sampler/processor and a few improvements

2025-09-01 12:49:50 +08:00 · 2024-11-05 17:01:21 -08:00
parent 3783156072
commit 0be87b3c53
9 changed files with 153 additions and 164 deletions
--- a/llms/README.md
+++ b/llms/README.md
@@ -101,7 +101,8 @@ To see a description of all the arguments you can do:
 #### Streaming

 For streaming generation, use the `stream_generate` function. This returns a
-generator object which streams the output text. For example,
+generator object which streams the output text, token, and log probabilities.
+For example,

 ```python
 from mlx_lm import load, stream_generate
@@ -116,7 +117,7 @@ prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
 )

-for t in stream_generate(model, tokenizer, prompt, max_tokens=512):
+for text, *_ in stream_generate(model, tokenizer, prompt, max_tokens=512):
    print(t, end="", flush=True)
 print()
 ```