zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-06-25 01:41:17 +08:00

Author	SHA1	Message	Date
Alex Barron	c79f6a4a8c	3 and 6 bit quantization (#1613 ) * Support 3 and 6 bit quantization	2024-11-22 10:22:13 -08:00
Alex Barron	26be608470	Add split_k `qvm` for long context (#1564 ) * Add splitk qvm * configurable splitk * tuning * remove extra instantiation * remove refactor * separate test * cpu tolerance	2024-11-05 11:25:19 -08:00
Alex Barron	d15fa13daf	Batched Quantized Matmul + Fast Small QMV (#1503 ) * add fast qmv for small dims * fix test * batched cpu * add batched template param * refactor metal quantized.cpp	2024-10-21 16:23:17 -07:00
Alex Barron	c52d1600f0	Fused Affine Quantize/Dequantize ops (#1282 ) * Add fast affine dequantize * add full quantize kernel * fused kernel with scale/bias computation * fix docstring * fix no jit error * fix test * test fix * reduce fast api to only affine_quantize	2024-07-29 15:11:38 -07:00
Awni Hannun	d568c7ee36	Rename block sparse (#1149 ) * block_sparse_mm to gather_mm * rename * nit * nit	2024-05-22 07:48:34 -07:00
Angelos Katharopoulos	e78a6518fa	Block sparse qmm (#1124 )	2024-05-16 15:24:14 -07:00
Angelos Katharopoulos	17f57df797	Improvements in the quantizer and dequantization kernel (#1061 )	2024-05-01 18:19:11 -07:00
Angelos Katharopoulos	8db7161c94	Bug fix in quantize (#1054 )	2024-04-29 20:55:04 -07:00
Angelos Katharopoulos	ec8578d41a	Fix quantization of all 0s (#1028 )	2024-04-24 00:40:42 -07:00
Angelos Katharopoulos	84d61d27aa	Make sure 0 is represented in the quantization (#1016 )	2024-04-19 19:47:26 -07:00
Awni Hannun	039da779d1	No quant reshape (#957 ) * precise option on cpu * remove print * remove reshape in quant matmul * no quant reshape	2024-04-04 11:52:12 -07:00
Angelos Katharopoulos	5f9ba3019f	Fix qmm_t for unaligned cases (#923 )	2024-03-28 15:34:57 -07:00
Angelos Katharopoulos	40c108766b	Quantized matmul fix (#677 ) * Fix qmv for small or unaligned matrices * Fix qmm	2024-02-12 18:54:21 -08:00
Awni Hannun	7a34e46677	Quantize with groups of 32 (#511 ) * allow quantize with group sizes of 32 * missing cpu dispatch * remove print * Fix qvm for group_size 32 --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-01-21 06:19:05 -08:00
Angelos Katharopoulos	c15fe3e61b	Allow arbitrary first dimension in quantization kernels. (#458 ) * Allow arbitrary first dim on qmm_t and qmv * Allow arbitrary first dim on qmm and qvm * Specialized aligned vs unaligned case * Add more checks for valid quantizations	2024-01-16 00:46:21 -08:00
Angelos Katharopoulos	e7f5059fe4	Support for quantized matmul with w and w^T (#349 ) * Add the metal qvm implementation * Add qmm_n * Add gradient wrt to input for quantized_matmul	2024-01-03 14:22:36 -08:00
Angelos Katharopoulos	447bc089b9	Fix tolerance in de-/quantization test (#295 )	2023-12-26 19:21:05 -08:00
Angelos Katharopoulos	b3916cbf2b	Improve names of quantization arguments (#235 ) * Change the default quantization group_size to 64 * Rename groups to group_size and width to bits	2023-12-20 16:53:53 -08:00
Angelos Katharopoulos	57fe918cf8	Adds C++ and nn quantization utilities (#230 ) * Add C++ de-/quantize ops * Add quantize functions to the docs and tests * Add a QuantizedLinear module	2023-12-20 14:17:38 -08:00
Angelos Katharopoulos	dfa9f4bc58	An initial quantized matmul implementation (#205 ) * Add quantized matvec * Add quantized matrix matrix with 2nd matrix transposed * Add quantized matmul tests * Add a slow cpu quantized matmul * Add a slightly faster vectorized cpu version	2023-12-18 23:18:57 -08:00

20 Commits