mlx-examples/llms/tests
Awni Hannun fca087be49
More cache improvements (#1015)
* fix rotating kv cache for chat use case

* reorg + fixes to caching, unify prompt caching across types and use cases for e.g. caching during a chat

* nit in chat

* fix tests

* fix tests

* fix tests

* docs

* chat command

* comments + docs

* Define meta_state on all Cache implementations

* fixes + trim_prompt_cache api

* fix default model

---------

Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-10-07 20:45:51 -07:00
..
test_datsets.py Configuration-based use of HF hub-hosted datasets for training (#701) 2024-06-26 10:20:50 -07:00
test_finetune.py Feature: QDoRA (#891) 2024-09-30 08:01:11 -07:00
test_generate.py repetiton_penalty and logits_bias just using logits_processors (#1004) 2024-09-30 08:49:03 -07:00
test_gguf.py fix(mlx-lm): type hints in gguf.py (#621) 2024-03-26 07:56:01 -07:00
test_models.py More cache improvements (#1015) 2024-10-07 20:45:51 -07:00
test_prompt_cache.py More cache improvements (#1015) 2024-10-07 20:45:51 -07:00
test_sample_utils.py Faster sampling with mx.compile (#937) 2024-08-15 11:29:09 -07:00
test_server.py Add /v1/models endpoint to mlx_lm.server (#984) 2024-09-28 07:21:11 -07:00
test_tuner_utils.py LoRA: Extract small function (#614) 2024-06-02 06:38:42 -07:00
test_utils_load_model.py support load model by custom get_model_classes (#899) 2024-07-25 11:01:17 -07:00
test_utils.py Fix whipser conversion for safetensors models (#935) 2024-08-14 10:22:04 -07:00