Goekdeniz-Guelmez
2f95b361a8
removed the custom Mamba2Cache adn updated the existing MambaCache but still only one input Token and outputs gibberish
2024-11-10 16:57:03 +01:00
Gökdeniz Gülmez
49d3f188f8
Merge branch 'ml-explore:main' into adding-support-for-mamba2
2024-11-10 16:36:02 +01:00
Goekdeniz-Guelmez
3a499f9735
fixed inference slowness but it cant handle multible Token inputs and is generateing gibberish
2024-11-10 16:35:07 +01:00
Goekdeniz-Guelmez
800b60239c
save checkpoint
2024-11-10 14:36:26 +01:00
Goekdeniz-Guelmez
906f972d36
save push
2024-11-06 16:35:46 +01:00
Angelos Katharopoulos
ed9e81dd58
Fix rotating kv cache size ( #1093 )
2024-11-05 10:24:24 -08:00
Alex Barron
85ffd2c96a
Quantized KV Cache ( #1075 )
...
* add QuantizedKVCache
* simplify
* add tests
* single sdpa function
* fix sed
* in place
* fix tests
* support different k and v head dims
2024-10-31 16:59:52 -07:00
Goekdeniz-Guelmez
58b448dc0b
updates
2024-10-30 21:23:13 +01:00
Goekdeniz-Guelmez
7c8849e795
update
2024-10-24 16:16:42 +02:00
Goekdeniz-Guelmez
a677638c4b
inference works but is hella slow
2024-10-22 23:06:06 +02:00
Goekdeniz-Guelmez
e43a2ab229
not working, incorrect handling with cache probably
2024-10-22 22:04:25 +02:00
Goekdeniz-Guelmez
55485b98e8
update
2024-10-22 21:23:47 +02:00
Goekdeniz-Guelmez
ab4cf1d1cf
generation works but outputs gibberish
2024-10-20 18:04:34 +02:00
Goekdeniz-Guelmez
4ab5139c05
quick save
2024-10-20 16:11:39 +02:00
Awni Hannun
8dca1a2f60
Tokenizer updates + tests ( #1024 )
...
* tokenizer updates + tests
* nit
* add can_trim_prompt_cache
* nits
2024-10-14 10:48:46 -07:00
Awni Hannun
fca087be49
More cache improvements ( #1015 )
...
* fix rotating kv cache for chat use case
* reorg + fixes to caching, unify prompt caching across types and use cases for e.g. caching during a chat
* nit in chat
* fix tests
* fix tests
* fix tests
* docs
* chat command
* comments + docs
* Define meta_state on all Cache implementations
* fixes + trim_prompt_cache api
* fix default model
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-10-07 20:45:51 -07:00