nits

2025-08-31 20:04:38 +08:00 · 2023-12-28 15:18:40 -08:00
parent ef773beab6
commit 253cc31815
1 changed files with 9 additions and 8 deletions
--- a/llms/speculative_decoding/README.md
+++ b/llms/speculative_decoding/README.md
@@ -49,15 +49,16 @@ are accepted by the larger model. That's more likely to happen if the models
 are trained on similar data.

 One way to increase the chance of accepting a draft token is with the parameter
-`--delta`. This parameter can be in the range `[0, 1]`. If it is `1` then all
-the draft tokens will be accepted by the model. If it is `0`, then only draft
-tokens which match the original acceptance criterion kept.[^1] Values closer to
-`1` increase the chance that a draft token is accepted.
+`--delta`. This parameter can be in the range $[0, 1]$. If it is $1$ then all
+the draft tokens will be accepted by the model. If it is $0$, then only draft
+tokens which match the original acceptance criterion are kept.[^1] Values
+closer to $1$ increase the chance that a draft token is accepted.

-Conversely, the fewer draft tokens accepted by the model, the more expensive
-speculative decoding is. You can use `--draft` to tune the number of draft
-tokens per model evaluation in order to reduce the number of discarded draft
-tokens.
+Conversely, the fewer draft tokens accepted by the main model, the more
+expensive speculative decoding is. You can use `--num-draft` to tune the number
+of draft tokens per model evaluation in order to reduce the number of discarded
+draft tokens. Decreasing `--num-draft` will decrease the number of discarded
+draft tokens at the expense of more large model evaluations.

 [^1]: See the paper [Fast Inference from Transformers via Speculative
 Decoding](https://arxiv.org/abs/2211.17192)