mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-06-24 17:31:18 +08:00
parent
485fb9ac0f
commit
9742ad0f51
@ -51,12 +51,12 @@ are trained on similar data.
|
|||||||
One way to increase the chance of accepting a draft token is with the parameter
|
One way to increase the chance of accepting a draft token is with the parameter
|
||||||
`--delta`. This parameter can be in the range $[0, 1]$. If it is $1$ then all
|
`--delta`. This parameter can be in the range $[0, 1]$. If it is $1$ then all
|
||||||
the draft tokens will be accepted by the model. If it is $0$, then only draft
|
the draft tokens will be accepted by the model. If it is $0$, then only draft
|
||||||
tokens which match the original acceptance criterion are kept.[^1] Values
|
tokens that match the original acceptance criterion are kept.[^1] Values
|
||||||
closer to $1$ increase the chance that a draft token is accepted.
|
closer to $1$ increase the chance that a draft token is accepted.
|
||||||
|
|
||||||
Conversely, the fewer draft tokens accepted by the main model, the more
|
Conversely, the fewer draft tokens accepted by the main model, the more
|
||||||
expensive speculative decoding is. You can use `--num-draft` to tune the number
|
expensive speculative decoding is. You can use `--num-draft` to tune the number
|
||||||
of draft tokens per model evaluation in order to reduce the number of discarded
|
of draft tokens per model evaluation to reduce the number of discarded
|
||||||
draft tokens. Decreasing `--num-draft` will decrease the number of discarded
|
draft tokens. Decreasing `--num-draft` will decrease the number of discarded
|
||||||
draft tokens at the expense of more large model evaluations.
|
draft tokens at the expense of more large model evaluations.
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user