zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-12-14 09:07:12 +08:00

Author	SHA1	Message	Date
nicolov	eaba55c9bf	Add matrix inversion primitive (#822 )	2024-03-15 06:34:36 -07:00
Awni Hannun	19ec023256	vmap matmul and admm (#836 )	2024-03-14 14:38:22 -07:00
Awni Hannun	63ab0ab580	version (#835 ) v0.7.0	2024-03-14 12:20:40 -07:00
Jagrit Digani	8dfc376c00	Strided reduce specialization for small reductions (#826 ) * Add small column / general reduction specialization	2024-03-14 09:16:53 -07:00
Angelos Katharopoulos	1efee9db09	Add types and order in kernel name (#831 )	2024-03-13 20:34:06 -07:00
Awni Hannun	43abc402d8	route to fallback (#828 )	2024-03-13 19:56:04 -07:00
Angelos Katharopoulos	3f8b1668c4	Make reshape faster for row_contiguous cases (#829 )	2024-03-13 16:22:03 -07:00
Angelos Katharopoulos	76c919b4ec	NumberOfElements for shapeless compile and vmap fixes (#802 )	2024-03-13 10:34:14 -07:00
Angelos Katharopoulos	29d0c10ee5	Reshape improvement (#818 )	2024-03-12 17:54:31 -07:00
Jagrit Digani	5ad133f8bb	No copy gems (#801 ) * Enable collapsing batch dims in gemm * Update gemm to only make copies when neither of the last 2 axes are contiguous * Update addmm to support gemv shapes * Update addmm to support irregular batch strides * Update tests	2024-03-12 13:13:41 -07:00
nicolov	d0c544a868	Add SVD primitive (#809 ) Add SVD op using Accelerate's LAPACK following https://developer.apple.com/documentation/accelerate/ compressing_an_image_using_linear_algebra Co-authored-by: Nicolo Valigi <nvaligi@apple.com>	2024-03-12 12:30:11 -07:00
Daniel Falbel	ffb19df3c0	Fix docstring for correctly rendering (#820 )	2024-03-12 11:46:44 -07:00
Awni Hannun	8b7532b9ab	fix scatter (#821 )	2024-03-12 11:42:07 -07:00
Awni Hannun	366478c560	fix modules with dict (#819 )	2024-03-12 08:54:06 -07:00
Justin Deschenaux	8e5600022a	Implement RNN, GRU, LSTM (#268 ) * RNN base implementation * Address comments+format * nits in docs * add tests for prb * fix test * add a couple tests --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-11 21:14:44 -07:00
Awni Hannun	0e95b64942	Fix bug in tape order during simplify (#816 ) * fix bug in tape order during simplify * properly fix compile * last bug	2024-03-11 17:29:05 -07:00
nicolov	0ae22b915b	Remove code duplication in reduce ops (#793 ) * Remove code duplication in reduce ops * Remove the unnecessary lambda --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-03-11 10:57:07 -07:00
Awni Hannun	7c441600fe	Compile stride bug (#812 ) * fix compile stride bug * revert sdpa fix * fix cpu * fix bug with simplifying outputs	2024-03-11 06:31:31 -07:00
Awni Hannun	a4d290adb9	Remove depth traversal (#813 ) * no depth traversal * counter outside loop	2024-03-09 20:21:32 -08:00
Awni Hannun	28301807c2	Version bump and os error (#807 ) v0.6.0	2024-03-07 13:57:58 -08:00
Awni Hannun	74ed0974b3	Support 13.0+ with xcode 14.3 (#806 ) * Support 13.0+ with xcode 14.3 * revert revert	2024-03-07 13:27:57 -08:00
Jagrit Digani	ec8a4864fa	Fix SDPA kernel bug on Mac OS 13.3 SDK (#805 ) * Move sdpa kernel to allocate tgp mem statically and allow macOS 13.3 SDK builds * Style	2024-03-07 10:18:09 -08:00
Awni Hannun	b7588fd5d7	fix inplace to not make a shallow copy (#804 )	2024-03-07 09:34:11 -08:00
Awni Hannun	f512b905c7	Minimum xcode / sdk (#800 ) * minimum xcode /sdk * try multiple xcode versions in CI * update python * metal validation for python tests	2024-03-07 08:19:43 -08:00
Awni Hannun	afd5274049	route to fallback for bfloat (#794 )	2024-03-06 15:39:12 -08:00
Awni Hannun	1074674e32	Add a maximum graph depth (#797 ) * add a maximum graph depth * remember how to use C++	2024-03-06 15:39:00 -08:00
AlexCheema	7762e07fde	Update function_transforms.rst (#796 ) Fix typo in function_transforms.rst	2024-03-06 12:03:37 -08:00
Luca Arnaboldi	cbefd9129e	Implementation of pickle, copy and deepcopy for Python arrays (#300 & #367 ). (#713 ) * Implemented pickling and copy for Python arrays(#300 & #367) * Fixing typos * Pickle with NumPy arrays * Pickle: workaround for bfloat16 * Revert "Pickle: workaround for bfloat16" This reverts commit `25afe6bc09`. * Added an error when pickling bfloat16 * Update python/tests/test_array.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/tests/test_array.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/array.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/array.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * clang-format applied --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-03-06 08:02:41 -08:00
Angelos Katharopoulos	e39bebe13e	Fix reshaping of empty arrays (#791 )	2024-03-05 23:33:22 -08:00
Angelos Katharopoulos	14b4e51a7c	Improved quantized matrix vector product (#786 )	2024-03-05 17:32:19 -08:00
Awni Hannun	cbcf44a4ca	Some fixes in cache / thread safety (#777 ) * some fixes in cache / thread safety * speed up no cache case * fix opt test * optimizer docs * otpimizer docs * fix adafactor * fix adafactor	2024-03-05 13:30:50 -08:00
Awni Hannun	859ae15a54	Fix test (#785 )	2024-03-04 23:02:27 -08:00
Brian Keene	0787724c44	Fast Inference SDPA op (#735 ) * Fast Inference SDPA op Implements metal shaders for: o = mx.fast_inference_sdpa(queries, keys, values, scale, mask) Supports fp16, fp32 dtypes; assumes d_k = 128. Generic op support / prompt encoding supported via mlx primitives. Metal implementation is for the inference use case only. Majority of performance benefits appears to results from GQA & reduced bandwidth requirements; there is approximate performance parity for the MHA use case (from some measurements on M3 Max). * Flush shared memory to zero before unprotected reads for (scores @ values) * Move to fast:: namespace, address reviewer comments ... also attempt to revert formatter auto-change for files not relevant to this change * Shared memory flush to top of kernel * Resolve compiler warnings * Update python/src/fast.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/fast.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/fast.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/fast.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update docstring per PR feedback * Softmax in higher precision, ... * route to fallback for more use cases - batch size > 1, head_dim other than 128, etc. * Address linux build failure * Address other reviewer comments * Remove extraneous eval_cpu function per review --------- Co-authored-by: Atila Orhon <64497909+atiorh@users.noreply.github.com> Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Co-authored-by: atila <atiorh@icloud.com>	2024-03-04 21:06:11 -08:00
Awni Hannun	7b463ffb07	Ios compile (#784 ) * try to fix build for ios * skip cpu compile * fix namespace * fix namespace * Use CMake for platform specific cpu compile --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-03-04 20:02:26 -08:00
Jagrit Digani	6686e61ca4	Reduce update (#783 ) * Split reduction files to reduce compile times * Add small and medium axis size specializations for row reductions * Add non-row-reduction options for small and med kernels	2024-03-04 19:09:51 -08:00
Awni Hannun	c096a77b9b	revision bump (#778 ) v0.5.1	2024-03-04 13:41:53 -08:00
Awni Hannun	5121f028d9	nice tensordot for mlx c (#782 )	2024-03-04 09:51:02 -08:00
Piotr Rybiec	6a665ea6ed	Dilation for convolutional layers (#766 ) * add dilation parameter to Conv1d layer * space here too * add conv1d dilation test * add dilation parameter for Conv2d layer * conv2d dilation test	2024-03-04 06:43:00 -08:00
Awni Hannun	bc06cb9ff6	Pickle + dtype fix for numpy conversion (#763 ) * pickle + dtype fix for numpy conversion * fix getattribute on Module base * remove unused function * fix tests * add topk to ops * fix doc	2024-03-02 06:09:29 -08:00
Angelos Katharopoulos	8e281c76c3	Fix the top-k op (#768 )	2024-03-01 22:08:43 -08:00
Awni Hannun	d5964a2710	bindings for memory info (#761 ) * bindings for memory info * update api * keep cache low if requested * fix default * nit in ops error	2024-03-01 19:51:58 -08:00
Ikko Eltociear Ashimine	cf3eb87e52	Fix typo in transforms.cpp (#764 ) occuring -> occurring	2024-02-29 22:23:46 -08:00
Awni Hannun	ab3a466711	bump (#760 ) v0.5.0	2024-02-29 11:58:54 -08:00
Awni Hannun	4494970f47	avoid nested closures in module (#759 )	2024-02-29 09:39:52 -08:00
Jagrit Digani	776c3d226d	Convolution update (#651 ) * Init steel conv and update Conv primitive * Update slow CPU implementation to support flipping and input dilation winograd conv routing Co-authored-by: Awni Hannun <awni@apple.com>	2024-02-28 20:11:16 -08:00
Awni Hannun	f5f18b704f	fix temporary bug (#752 )	2024-02-27 17:44:39 -08:00
Awni Hannun	420ff2f331	Add back compiled function signatures and docstrings (#749 ) * try to add back compiled function signatures and docstrings * add indentation to docstring	2024-02-27 13:18:59 -08:00
Awni Hannun	56ba3ec40e	fix cpu compile on older OS (#747 )	2024-02-26 22:20:53 -08:00
Noah Kasmanoff	de3d2467a3	Update: Fast GeLU Approximation (#744 ) * add: fast gelu approx * fix docs * Update gelu_fast_approx function documentation * Update python/mlx/nn/layers/activations.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * fix: test gelu --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-02-26 21:08:50 -08:00
Awni Hannun	fe1dabf272	Fix compile with non standard types (#745 ) * refactor tree utils * fix compile + tree code refactor * Add an extra test * add a few missing activations to docs * hash structure * Encode the full argument structure --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-02-26 19:28:53 -08:00

1 2 3 4 5 ...

392 Commits