mlx-examples/llms/mlx_lm/examples
Awni Hannun fca087be49
More cache improvements (#1015)
* fix rotating kv cache for chat use case

* reorg + fixes to caching, unify prompt caching across types and use cases for e.g. caching during a chat

* nit in chat

* fix tests

* fix tests

* fix tests

* docs

* chat command

* comments + docs

* Define meta_state on all Cache implementations

* fixes + trim_prompt_cache api

* fix default model

---------

Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-10-07 20:45:51 -07:00
..
chat.py More cache improvements (#1015) 2024-10-07 20:45:51 -07:00
generate_response.py More cache improvements (#1015) 2024-10-07 20:45:51 -07:00
lora_config.yaml Adding full finetuning (#903) 2024-09-29 17:12:47 -07:00
merge_config.yaml Support for slerp merging models (#455) 2024-02-19 20:37:15 -08:00