Awni Hannun
52c41b5b5a
Fix prompt cache for models without chat template ( #1250 )
...
* fix deepseek sharding (#1242 )
* fix prompt cache with no chat template
2025-02-06 11:10:58 -08:00
Awni Hannun
c4833a2f55
fix encoding with special tokens + chat template ( #1189 )
2025-01-03 10:50:59 -08:00
Awni Hannun
1963df8565
Allow prompt callback to generate_step
( #1133 )
...
* allow prompt callback and use in cache_prompt
* nit
* comments
* bump version
2024-12-03 16:17:14 -08:00
Awni Hannun
657b4cc0aa
[MLX LM] Sampler refactor + a few improvements ( #1094 )
...
* starting
* refactor sampler/processor and a few improvements
* fix stream
* fix stream generate
* fix eos handling in stream generate
2024-11-07 16:15:24 -08:00
Alex Barron
85ffd2c96a
Quantized KV Cache ( #1075 )
...
* add QuantizedKVCache
* simplify
* add tests
* single sdpa function
* fix sed
* in place
* fix tests
* support different k and v head dims
2024-10-31 16:59:52 -07:00
Awni Hannun
fca087be49
More cache improvements ( #1015 )
...
* fix rotating kv cache for chat use case
* reorg + fixes to caching, unify prompt caching across types and use cases for e.g. caching during a chat
* nit in chat
* fix tests
* fix tests
* fix tests
* docs
* chat command
* comments + docs
* Define meta_state on all Cache implementations
* fixes + trim_prompt_cache api
* fix default model
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-10-07 20:45:51 -07:00
Angelos Katharopoulos
324184d670
Fix the cache_prompt ( #979 )
2024-09-06 20:19:27 -07:00
Awni Hannun
b1186e2a81
Docs on prompt scaling ( #963 )
...
* docs on prompt scaling
* remove unused var
* nits
2024-08-29 15:05:17 -07:00
Angelos Katharopoulos
1003a8b2dd
Add the ability to load the KV cache from a file ( #956 )
2024-08-28 22:11:45 -07:00