diff --git a/llms/README.md b/llms/README.md index 3c5a0b3d..e2d1db59 100644 --- a/llms/README.md +++ b/llms/README.md @@ -64,29 +64,6 @@ prompt = tokenizer.apply_chat_template( text = generate(model, tokenizer, prompt=prompt, verbose=True) ``` -To use temperature or other sampler arguments pass it like this - -``` -from mlx_lm import load, generate - -model, tokenizer = load("mlx-community/Mistral-7B-Instruct-v0.3-4bit") - -temp: 0.7 -top_p: 0.9 -top_k: 25 -sampler = make_sampler(temp, top_p,top_k) - -prompt = "Write a story about Ada Lovelace" - -messages = [{"role": "user", "content": prompt}] -prompt = tokenizer.apply_chat_template( - messages, add_generation_prompt=True -) - -text = generate(model, tokenizer, prompt=prompt, sampler, verbose=True) - -``` - To see a description of all the arguments you can do: ``` @@ -146,6 +123,18 @@ for response in stream_generate(model, tokenizer, prompt, max_tokens=512): print() ``` +#### Sampling + +The `generate` and `stream_generate` functions accept `sampler` and +`logits_processors` keyword arguments. A sampler is any callable which accepts +a possibly batched logits array and returns an array of sampled tokens. The +`logits_processors` must be a list of callables which take the token history +and current logits as input and return the processed logits. The logits +processors are applied in order. + +Some standard sampling functions and logits processors are provided in +`mlx_lm.sample_utils`. + ### Command Line You can also use `mlx-lm` from the command line with: