gguf-tools

mirror of https://github.com/antirez/gguf-tools.git synced 2025-12-16 00:18:52 +08:00

Author	SHA1	Message	Date
antirez	a3257ff3cb	emb2redis utility added.	2025-08-28 16:35:01 +02:00
Salvatore Sanfilippo	8fa6eb6523	Merge pull request #16 from jart/ftz Remove flush to zero from bf16	2025-01-09 16:46:11 +01:00
Salvatore Sanfilippo	bd64d6e812	Merge pull request #15 from jart/features Introduce --diffable flag	2025-01-09 16:44:59 +01:00
Justine Tunney	918234ce80	Remove flush to zero from bf16 After closely analyzing Google Brain codebases, we decided that flushing to zero was the wrong thing to do. Intel and AMD probably designed their microprocessors to always flush to zero for the wrong reasons. It should have been made conditional on FTZ being set in MXCSR like other opcodes. See ggerganov/llama.cpp#7843	2024-07-03 05:39:16 -07:00
Justine Tunney	6deab767f9	Introduce --diffable flag Sometimes it's useful to get an overview of how tensors changes when using different quantization formats. For example: diff -u <(gguf-tools show --diffable ggml-model-bf16.gguf) \ <(gguf-tools show --diffable ggml-model-Q6_K.gguf) \| less Is now able to produces nice clean output. Without this change, every line would have been different due to the file offsets and byte sizes which means `diff -u` would produce one gigantic unreadable chunk.	2024-05-26 00:23:41 -07:00
Salvatore Sanfilippo	4e6455ecaf	Merge pull request #14 from jart/update Add BF16 support and fix warnings	2024-05-26 09:22:00 +02:00
Justine Tunney	ede59bb742	Add BF16 support and fix warnings This change updates the data type definitions to be the same as the latest source code. Support for the bfloat16 data type is available however it can't interpret the IQ quantization formats yet. Cleanup of compiler warnings and other nits have been fixed, but behavioral changes have been avoided, and no new features are as of yet added.	2024-05-25 22:58:50 -07:00
Salvatore Sanfilippo	3e5c0a464d	Merge pull request #12 from jmousseau/match-ints Match key-value pair and tensor counts with header integer width	2024-02-18 16:36:26 +01:00
Salvatore Sanfilippo	9c87cb78b0	Merge pull request #11 from jmousseau/leak-on-error Prevent memory leak when tensor type is invalid	2024-02-18 16:34:10 +01:00
Jack Mousseau	7d25893516	Match key-value pair and tensor counts with header integer width	2024-02-18 07:26:35 -08:00
Jack Mousseau	c2cef3d1d8	Prevent memory leak when tensor type is invalid	2024-02-18 07:24:44 -08:00
Salvatore Sanfilippo	af7d88d808	Merge pull request #9 from jbochi/q4_1_fix Fix q4_1 dequantization	2024-01-10 17:13:17 +01:00
Juarez Bochi	55d6267c31	Fix q4_1 dequantization	2024-01-10 10:17:13 -05:00
Salvatore Sanfilippo	fe34f6ec5c	Merge pull request #8 from jbochi/q4 Add support for q4_0 and q4_1 quantizations	2024-01-10 00:10:54 +01:00
Juarez Bochi	dc69c608df	Add support for q4_0 and q4_1 quantizations	2024-01-09 18:04:18 -05:00
antirez	eec3dc9f54	F16 output for dequantization.	2024-01-09 18:46:26 +01:00
antirez	26e3a59233	Rename gguf_init/end to more obvious names.	2024-01-09 16:35:40 +01:00
antirez	6eb4aeb2fb	gguf_create(): take flags to be able to overwrite files. Fixes #7 .	2024-01-09 16:32:10 +01:00
Salvatore Sanfilippo	81dbf8f8d2	Merge pull request #6 from jbochi/reverse_stride Print tensor with correct strides	2024-01-09 15:48:46 +01:00
antirez	419d4706f6	Q2_K dequantization.	2024-01-05 23:38:47 +01:00
Juarez Bochi	50e79b9ec0	Print tensor with correct strides	2024-01-05 09:59:59 -05:00
Salvatore Sanfilippo	e48ca317ea	Merge pull request #5 from jbochi/inspect_shape Inspect tensor taking dims into consideration	2024-01-04 20:32:19 +01:00
Salvatore Sanfilippo	a42344e197	Merge pull request #4 from jbochi/show_shape Print tensor dimensions	2024-01-04 20:31:23 +01:00
Salvatore Sanfilippo	92e1c67b8b	Merge pull request #3 from jbochi/int_type_features Add tensor type features for int types	2024-01-04 20:30:32 +01:00
Juarez Bochi	58a0479bb4	Inspect tensor taking dims into consideration	2024-01-04 11:44:13 -05:00
Juarez Bochi	a7e99574e2	Print tensor dimensions	2024-01-03 17:41:33 -05:00
Juarez Bochi	5d10eaac8d	Add tensor type features for int types	2024-01-03 16:33:47 -05:00
antirez	b1f32c4088	Quantization functions refactoring.	2024-01-03 21:02:47 +01:00
antirez	ff16bc3dcf	Speed: use the right compilation flags to dequantize faster.	2024-01-03 21:02:47 +01:00
Salvatore Sanfilippo	b4e7da4ceb	Merge pull request #1 from jbochi/typos Fix some typos	2024-01-03 14:54:30 +01:00
Salvatore Sanfilippo	04ec28ed35	Merge pull request #2 from jbochi/check_remap Check remap when appending kv/info/data	2024-01-03 14:53:41 +01:00
Juarez Bochi	463fd63cf2	Check remap when appending kv/info/data	2024-01-03 08:01:00 -05:00
Juarez Bochi	e5cdcec626	Fix some typos	2024-01-03 07:34:12 -05:00
antirez	c8469c4a27	Q6_K quantization implemented.	2023-12-31 14:06:49 +01:00
antirez	54b93edecb	README: grammar.	2023-12-30 18:08:27 +01:00
antirez	4a5dfdcdad	README: show subcommand example output.	2023-12-30 18:02:21 +01:00
antirez	53e7b2b156	README: grammar.	2023-12-30 18:00:23 +01:00
antirez	e8b405aac8	README updated.	2023-12-30 17:29:44 +01:00
antirez	a4858afb4d	Implement f16/f32 in gguf_tensor_to_float().	2023-12-30 17:23:27 +01:00
antirez	136e04977c	README: add compare example.	2023-12-30 15:47:52 +01:00
antirez	951ce0e3c4	Compare subcommand: report difference as %.	2023-12-30 15:43:44 +01:00
antirez	3663d73c22	Compare subcommand: just skip tensors we can't yet dequantize.	2023-12-30 10:13:38 +01:00
antirez	400f60b75b	--verbose and README updated.	2023-12-29 22:50:41 +01:00
antirez	54946cbf14	Compare subcommand.	2023-12-28 17:24:05 +01:00
antirez	2a599dc5d0	Show subcommand: print total parameters.	2023-12-28 16:07:16 +01:00
antirez	e2062eea2c	Q4_K dequantization.	2023-12-28 12:31:35 +01:00
antirez	c25ccfa02a	Q8_0 dequantization.	2023-12-27 21:22:33 +01:00
antirez	558c7c3c6d	Clarify the need for FP16 implementation.	2023-12-27 18:54:36 +01:00
antirez	bd4ecbda94	FP16 added. Split-mixtral improved.	2023-12-27 15:25:18 +01:00
antirez	a77a4d061c	Mixtral experts extraction test.	2023-12-26 17:23:47 +01:00

1 2

61 Commits