mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-08-30 02:53:41 +08:00
22 lines
461 B
Plaintext
22 lines
461 B
Plaintext
## Steps to reproduce
|
|
|
|
Run the following with and without `prefill_step_size=2` commented out:
|
|
|
|
```py
|
|
import mlx_lm
|
|
|
|
model, tokenizer = mlx_lm.load('/Users/llwu/models/mlx/Meta-Llama-3.1-8B-4bit')
|
|
|
|
mlx_lm.generate(
|
|
model,
|
|
tokenizer,
|
|
prompt="69 + 420= ",
|
|
verbose=True,
|
|
max_tokens=10,
|
|
max_kv_size=5,
|
|
prefill_step_size=2,
|
|
)
|
|
```
|
|
|
|
The output is different. I notice that the RotatingKVCache has length 5 with prefill and length 7 without.
|