Commit Graph

61 Commits

Author SHA1 Message Date
antirez
a3257ff3cb emb2redis utility added. 2025-08-28 16:35:01 +02:00
Salvatore Sanfilippo
8fa6eb6523 Merge pull request #16 from jart/ftz
Remove flush to zero from bf16
2025-01-09 16:46:11 +01:00
Salvatore Sanfilippo
bd64d6e812 Merge pull request #15 from jart/features
Introduce --diffable flag
2025-01-09 16:44:59 +01:00
Justine Tunney
918234ce80 Remove flush to zero from bf16
After closely analyzing Google Brain codebases, we decided that flushing
to zero was the wrong thing to do. Intel and AMD probably designed their
microprocessors to always flush to zero for the wrong reasons. It should
have been made conditional on FTZ being set in MXCSR like other opcodes.

See ggerganov/llama.cpp#7843
2024-07-03 05:39:16 -07:00
Justine Tunney
6deab767f9 Introduce --diffable flag
Sometimes it's useful to get an overview of how tensors changes when
using different quantization formats. For example:

  diff -u <(gguf-tools show --diffable ggml-model-bf16.gguf) \
          <(gguf-tools show --diffable ggml-model-Q6_K.gguf) | less

Is now able to produces nice clean output. Without this change, every
line would have been different due to the file offsets and byte sizes
which means `diff -u` would produce one gigantic unreadable chunk.
2024-05-26 00:23:41 -07:00
Salvatore Sanfilippo
4e6455ecaf Merge pull request #14 from jart/update
Add BF16 support and fix warnings
2024-05-26 09:22:00 +02:00
Justine Tunney
ede59bb742 Add BF16 support and fix warnings
This change updates the data type definitions to be the same as the
latest source code. Support for the bfloat16 data type is available
however it can't interpret the IQ quantization formats yet. Cleanup
of compiler warnings and other nits have been fixed, but behavioral
changes have been avoided, and no new features are as of yet added.
2024-05-25 22:58:50 -07:00
Salvatore Sanfilippo
3e5c0a464d Merge pull request #12 from jmousseau/match-ints
Match key-value pair and tensor counts with header integer width
2024-02-18 16:36:26 +01:00
Salvatore Sanfilippo
9c87cb78b0 Merge pull request #11 from jmousseau/leak-on-error
Prevent memory leak when tensor type is invalid
2024-02-18 16:34:10 +01:00
Jack Mousseau
7d25893516 Match key-value pair and tensor counts with header integer width 2024-02-18 07:26:35 -08:00
Jack Mousseau
c2cef3d1d8 Prevent memory leak when tensor type is invalid 2024-02-18 07:24:44 -08:00
Salvatore Sanfilippo
af7d88d808 Merge pull request #9 from jbochi/q4_1_fix
Fix q4_1 dequantization
2024-01-10 17:13:17 +01:00
Juarez Bochi
55d6267c31 Fix q4_1 dequantization 2024-01-10 10:17:13 -05:00
Salvatore Sanfilippo
fe34f6ec5c Merge pull request #8 from jbochi/q4
Add support for q4_0 and q4_1 quantizations
2024-01-10 00:10:54 +01:00
Juarez Bochi
dc69c608df Add support for q4_0 and q4_1 quantizations 2024-01-09 18:04:18 -05:00
antirez
eec3dc9f54 F16 output for dequantization. 2024-01-09 18:46:26 +01:00
antirez
26e3a59233 Rename gguf_init/end to more obvious names. 2024-01-09 16:35:40 +01:00
antirez
6eb4aeb2fb gguf_create(): take flags to be able to overwrite files. Fixes #7. 2024-01-09 16:32:10 +01:00
Salvatore Sanfilippo
81dbf8f8d2 Merge pull request #6 from jbochi/reverse_stride
Print tensor with correct strides
2024-01-09 15:48:46 +01:00
antirez
419d4706f6 Q2_K dequantization. 2024-01-05 23:38:47 +01:00
Juarez Bochi
50e79b9ec0 Print tensor with correct strides 2024-01-05 09:59:59 -05:00
Salvatore Sanfilippo
e48ca317ea Merge pull request #5 from jbochi/inspect_shape
Inspect tensor taking dims into consideration
2024-01-04 20:32:19 +01:00
Salvatore Sanfilippo
a42344e197 Merge pull request #4 from jbochi/show_shape
Print tensor dimensions
2024-01-04 20:31:23 +01:00
Salvatore Sanfilippo
92e1c67b8b Merge pull request #3 from jbochi/int_type_features
Add tensor type features for int types
2024-01-04 20:30:32 +01:00
Juarez Bochi
58a0479bb4 Inspect tensor taking dims into consideration 2024-01-04 11:44:13 -05:00
Juarez Bochi
a7e99574e2 Print tensor dimensions 2024-01-03 17:41:33 -05:00
Juarez Bochi
5d10eaac8d Add tensor type features for int types 2024-01-03 16:33:47 -05:00
antirez
b1f32c4088 Quantization functions refactoring. 2024-01-03 21:02:47 +01:00
antirez
ff16bc3dcf Speed: use the right compilation flags to dequantize faster. 2024-01-03 21:02:47 +01:00
Salvatore Sanfilippo
b4e7da4ceb Merge pull request #1 from jbochi/typos
Fix some typos
2024-01-03 14:54:30 +01:00
Salvatore Sanfilippo
04ec28ed35 Merge pull request #2 from jbochi/check_remap
Check remap when appending kv/info/data
2024-01-03 14:53:41 +01:00
Juarez Bochi
463fd63cf2 Check remap when appending kv/info/data 2024-01-03 08:01:00 -05:00
Juarez Bochi
e5cdcec626 Fix some typos 2024-01-03 07:34:12 -05:00
antirez
c8469c4a27 Q6_K quantization implemented. 2023-12-31 14:06:49 +01:00
antirez
54b93edecb README: grammar. 2023-12-30 18:08:27 +01:00
antirez
4a5dfdcdad README: show subcommand example output. 2023-12-30 18:02:21 +01:00
antirez
53e7b2b156 README: grammar. 2023-12-30 18:00:23 +01:00
antirez
e8b405aac8 README updated. 2023-12-30 17:29:44 +01:00
antirez
a4858afb4d Implement f16/f32 in gguf_tensor_to_float(). 2023-12-30 17:23:27 +01:00
antirez
136e04977c README: add compare example. 2023-12-30 15:47:52 +01:00
antirez
951ce0e3c4 Compare subcommand: report difference as %. 2023-12-30 15:43:44 +01:00
antirez
3663d73c22 Compare subcommand: just skip tensors we can't yet dequantize. 2023-12-30 10:13:38 +01:00
antirez
400f60b75b --verbose and README updated. 2023-12-29 22:50:41 +01:00
antirez
54946cbf14 Compare subcommand. 2023-12-28 17:24:05 +01:00
antirez
2a599dc5d0 Show subcommand: print total parameters. 2023-12-28 16:07:16 +01:00
antirez
e2062eea2c Q4_K dequantization. 2023-12-28 12:31:35 +01:00
antirez
c25ccfa02a Q8_0 dequantization. 2023-12-27 21:22:33 +01:00
antirez
558c7c3c6d Clarify the need for FP16 implementation. 2023-12-27 18:54:36 +01:00
antirez
bd4ecbda94 FP16 added. Split-mixtral improved. 2023-12-27 15:25:18 +01:00
antirez
a77a4d061c Mixtral experts extraction test. 2023-12-26 17:23:47 +01:00