GGUF: Avoid dequantization when format is compatible (#426)

* GGUF: Don't dequantize q4_1

* Fix weight order. First in low bits

* Add unpacking for q4_0

* Don't dequantize q8_0

* rebase quants and split file

* don't quantize every weight

* reapply patch

* error handling

---------

Co-authored-by: Awni Hannun <awni@apple.com>
This commit is contained in:
Juarez Bochi
2024-01-23 18:43:57 -05:00
committed by GitHub
parent 37fc9db82c
commit 4fe2fa2a64
5 changed files with 211 additions and 21 deletions

View File

@@ -500,7 +500,6 @@ TEST_CASE("test metal enable/disable cache") {
auto buf = a.malloc(size, false);
auto buf_ptr = static_cast<MTL::Buffer*>(buf.ptr());
unsigned char first_byte = *reinterpret_cast<unsigned char*>(buf_ptr);
printf("first byte: %d\n", first_byte);
// Release a
a.free(buf);
@@ -508,7 +507,6 @@ TEST_CASE("test metal enable/disable cache") {
// If release successfully, the first byte should be different from the
// first byte before release
unsigned char new_first_byte = *reinterpret_cast<unsigned char*>(buf_ptr);
printf("new first byte: %d\n", new_first_byte);
CHECK_NE(new_first_byte, first_byte);
}