mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-09-18 19:10:08 +08:00
reorg + fixes to caching, unify prompt caching across types and use cases for e.g. caching during a chat
This commit is contained in:
@@ -155,14 +155,14 @@ different queries. To cache a prompt use `mlx_lm.cache_prompt`. For example:
|
||||
cat prompt.txt | mlx_lm.cache_prompt \
|
||||
--model mistralai/Mistral-7B-Instruct-v0.3 \
|
||||
--prompt - \
|
||||
--kv-cache-file mistral_prompt.safetensors
|
||||
--prompt-cache-file mistral_prompt.safetensors
|
||||
```
|
||||
|
||||
Then use the cached prompt with `mlx_lm.generate`:
|
||||
|
||||
```
|
||||
mlx_lm.generate \
|
||||
--kv-cache-file mistral_prompt.safetensors \
|
||||
--prompt-cache-file mistral_prompt.safetensors \
|
||||
--prompt "\nSummarize the above text."
|
||||
```
|
||||
|
||||
|
Reference in New Issue
Block a user