antirez
a3257ff3cb
emb2redis utility added.
2025-08-28 16:35:01 +02:00
Salvatore Sanfilippo
8fa6eb6523
Merge pull request #16 from jart/ftz
...
Remove flush to zero from bf16
2025-01-09 16:46:11 +01:00
Salvatore Sanfilippo
bd64d6e812
Merge pull request #15 from jart/features
...
Introduce --diffable flag
2025-01-09 16:44:59 +01:00
Justine Tunney
918234ce80
Remove flush to zero from bf16
...
After closely analyzing Google Brain codebases, we decided that flushing
to zero was the wrong thing to do. Intel and AMD probably designed their
microprocessors to always flush to zero for the wrong reasons. It should
have been made conditional on FTZ being set in MXCSR like other opcodes.
See ggerganov/llama.cpp#7843
2024-07-03 05:39:16 -07:00
Justine Tunney
6deab767f9
Introduce --diffable flag
...
Sometimes it's useful to get an overview of how tensors changes when
using different quantization formats. For example:
diff -u <(gguf-tools show --diffable ggml-model-bf16.gguf) \
<(gguf-tools show --diffable ggml-model-Q6_K.gguf) | less
Is now able to produces nice clean output. Without this change, every
line would have been different due to the file offsets and byte sizes
which means `diff -u` would produce one gigantic unreadable chunk.
2024-05-26 00:23:41 -07:00
Salvatore Sanfilippo
4e6455ecaf
Merge pull request #14 from jart/update
...
Add BF16 support and fix warnings
2024-05-26 09:22:00 +02:00
Justine Tunney
ede59bb742
Add BF16 support and fix warnings
...
This change updates the data type definitions to be the same as the
latest source code. Support for the bfloat16 data type is available
however it can't interpret the IQ quantization formats yet. Cleanup
of compiler warnings and other nits have been fixed, but behavioral
changes have been avoided, and no new features are as of yet added.
2024-05-25 22:58:50 -07:00
Salvatore Sanfilippo
3e5c0a464d
Merge pull request #12 from jmousseau/match-ints
...
Match key-value pair and tensor counts with header integer width
2024-02-18 16:36:26 +01:00
Salvatore Sanfilippo
9c87cb78b0
Merge pull request #11 from jmousseau/leak-on-error
...
Prevent memory leak when tensor type is invalid
2024-02-18 16:34:10 +01:00
Jack Mousseau
7d25893516
Match key-value pair and tensor counts with header integer width
2024-02-18 07:26:35 -08:00
Jack Mousseau
c2cef3d1d8
Prevent memory leak when tensor type is invalid
2024-02-18 07:24:44 -08:00
Salvatore Sanfilippo
af7d88d808
Merge pull request #9 from jbochi/q4_1_fix
...
Fix q4_1 dequantization
2024-01-10 17:13:17 +01:00
Juarez Bochi
55d6267c31
Fix q4_1 dequantization
2024-01-10 10:17:13 -05:00
Salvatore Sanfilippo
fe34f6ec5c
Merge pull request #8 from jbochi/q4
...
Add support for q4_0 and q4_1 quantizations
2024-01-10 00:10:54 +01:00
Juarez Bochi
dc69c608df
Add support for q4_0 and q4_1 quantizations
2024-01-09 18:04:18 -05:00
antirez
eec3dc9f54
F16 output for dequantization.
2024-01-09 18:46:26 +01:00
antirez
26e3a59233
Rename gguf_init/end to more obvious names.
2024-01-09 16:35:40 +01:00
antirez
6eb4aeb2fb
gguf_create(): take flags to be able to overwrite files. Fixes #7 .
2024-01-09 16:32:10 +01:00
Salvatore Sanfilippo
81dbf8f8d2
Merge pull request #6 from jbochi/reverse_stride
...
Print tensor with correct strides
2024-01-09 15:48:46 +01:00
antirez
419d4706f6
Q2_K dequantization.
2024-01-05 23:38:47 +01:00
Juarez Bochi
50e79b9ec0
Print tensor with correct strides
2024-01-05 09:59:59 -05:00
Salvatore Sanfilippo
e48ca317ea
Merge pull request #5 from jbochi/inspect_shape
...
Inspect tensor taking dims into consideration
2024-01-04 20:32:19 +01:00
Salvatore Sanfilippo
a42344e197
Merge pull request #4 from jbochi/show_shape
...
Print tensor dimensions
2024-01-04 20:31:23 +01:00
Salvatore Sanfilippo
92e1c67b8b
Merge pull request #3 from jbochi/int_type_features
...
Add tensor type features for int types
2024-01-04 20:30:32 +01:00
Juarez Bochi
58a0479bb4
Inspect tensor taking dims into consideration
2024-01-04 11:44:13 -05:00
Juarez Bochi
a7e99574e2
Print tensor dimensions
2024-01-03 17:41:33 -05:00
Juarez Bochi
5d10eaac8d
Add tensor type features for int types
2024-01-03 16:33:47 -05:00
antirez
b1f32c4088
Quantization functions refactoring.
2024-01-03 21:02:47 +01:00
antirez
ff16bc3dcf
Speed: use the right compilation flags to dequantize faster.
2024-01-03 21:02:47 +01:00
Salvatore Sanfilippo
b4e7da4ceb
Merge pull request #1 from jbochi/typos
...
Fix some typos
2024-01-03 14:54:30 +01:00
Salvatore Sanfilippo
04ec28ed35
Merge pull request #2 from jbochi/check_remap
...
Check remap when appending kv/info/data
2024-01-03 14:53:41 +01:00
Juarez Bochi
463fd63cf2
Check remap when appending kv/info/data
2024-01-03 08:01:00 -05:00
Juarez Bochi
e5cdcec626
Fix some typos
2024-01-03 07:34:12 -05:00
antirez
c8469c4a27
Q6_K quantization implemented.
2023-12-31 14:06:49 +01:00
antirez
54b93edecb
README: grammar.
2023-12-30 18:08:27 +01:00
antirez
4a5dfdcdad
README: show subcommand example output.
2023-12-30 18:02:21 +01:00
antirez
53e7b2b156
README: grammar.
2023-12-30 18:00:23 +01:00
antirez
e8b405aac8
README updated.
2023-12-30 17:29:44 +01:00
antirez
a4858afb4d
Implement f16/f32 in gguf_tensor_to_float().
2023-12-30 17:23:27 +01:00
antirez
136e04977c
README: add compare example.
2023-12-30 15:47:52 +01:00
antirez
951ce0e3c4
Compare subcommand: report difference as %.
2023-12-30 15:43:44 +01:00
antirez
3663d73c22
Compare subcommand: just skip tensors we can't yet dequantize.
2023-12-30 10:13:38 +01:00
antirez
400f60b75b
--verbose and README updated.
2023-12-29 22:50:41 +01:00
antirez
54946cbf14
Compare subcommand.
2023-12-28 17:24:05 +01:00
antirez
2a599dc5d0
Show subcommand: print total parameters.
2023-12-28 16:07:16 +01:00
antirez
e2062eea2c
Q4_K dequantization.
2023-12-28 12:31:35 +01:00
antirez
c25ccfa02a
Q8_0 dequantization.
2023-12-27 21:22:33 +01:00
antirez
558c7c3c6d
Clarify the need for FP16 implementation.
2023-12-27 18:54:36 +01:00
antirez
bd4ecbda94
FP16 added. Split-mixtral improved.
2023-12-27 15:25:18 +01:00
antirez
a77a4d061c
Mixtral experts extraction test.
2023-12-26 17:23:47 +01:00