From 9742ad0f5173ece682ed1b93b3bac027d551b506 Mon Sep 17 00:00:00 2001 From: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Date: Mon, 8 Jan 2024 05:13:58 +0100 Subject: [PATCH] Update README.md (#248) fixed a few typos --- llms/speculative_decoding/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/llms/speculative_decoding/README.md b/llms/speculative_decoding/README.md index 220265ca..1606ce8d 100644 --- a/llms/speculative_decoding/README.md +++ b/llms/speculative_decoding/README.md @@ -51,12 +51,12 @@ are trained on similar data. One way to increase the chance of accepting a draft token is with the parameter `--delta`. This parameter can be in the range $[0, 1]$. If it is $1$ then all the draft tokens will be accepted by the model. If it is $0$, then only draft -tokens which match the original acceptance criterion are kept.[^1] Values +tokens that match the original acceptance criterion are kept.[^1] Values closer to $1$ increase the chance that a draft token is accepted. Conversely, the fewer draft tokens accepted by the main model, the more expensive speculative decoding is. You can use `--num-draft` to tune the number -of draft tokens per model evaluation in order to reduce the number of discarded +of draft tokens per model evaluation to reduce the number of discarded draft tokens. Decreasing `--num-draft` will decrease the number of discarded draft tokens at the expense of more large model evaluations.