mlx-examples/llms/issue.txt

## Steps to reproduce

Run the following with and without `prefill_step_size=2` commented out:

```py
import mlx_lm

model, tokenizer = mlx_lm.load('/Users/llwu/models/mlx/Meta-Llama-3.1-8B-4bit')

mlx_lm.generate(
    model,
    tokenizer,
    prompt="69 + 420= ",
    verbose=True,
    max_tokens=10,
    max_kv_size=5,
    prefill_step_size=2,
)
```

The output is different. I notice that the RotatingKVCache has length 5 with prefill and length 7 without.