Juarez Bochi
4fe2fa2a64
GGUF: Avoid dequantization when format is compatible ( #426 )
...
* GGUF: Don't dequantize q4_1
* Fix weight order. First in low bits
* Add unpacking for q4_0
* Don't dequantize q8_0
* rebase quants and split file
* don't quantize every weight
* reapply patch
* error handling
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-23 15:43:57 -08:00
Awni Hannun
3d99a8d31d
Fix format / build ( #489 )
2024-01-18 10:01:59 -08:00
Ethan
a749a91c75
Support disable metal buffer cache to prevent performance degradation caused by large memory caching ( #390 )
...
* support disable metal buffer cache, due to large unused memory buffered when llm generated long context tokens
* Run format and add "cache_enabled" feature tests
2024-01-18 08:33:34 -08:00
Awni Hannun
c9934fe8a4
Metal validation ( #432 )
...
* tests clear metal validation
* add cpp test with metal validation to circleci
* nit
2024-01-11 11:57:24 -08:00
Awni Hannun
46a39e5b1f
copyright + ack
2023-11-30 11:12:53 -08:00
Awni Hannun
8ca7f9e8e9
awni's commit files
2023-11-29 10:30:41 -08:00