zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-07-19 23:51:14 +08:00

Author	SHA1	Message	Date
Alex Barron	2e7c02d5cd	Metal FFT for powers of 2 up to 2048 (#915 ) * add Metal FFT for powers of 2 * skip GPU test on linux * fix contiguity bug * address comments * Update mlx/backend/metal/fft.cpp * Update mlx/backend/metal/fft.cpp * fix bug in synch --------- Co-authored-by: Alex Barron <abarron22@apple.com> Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-04-11 21:40:06 -07:00
Awni Hannun	ae18326533	No copy command encoder (#986 ) * no copy command encoder * up layer norm test tolerances	2024-04-11 21:15:36 -07:00
Angelos Katharopoulos	dce4bd74a4	Add ArrayDesc destructor to avoid possible stack overflow (#982 )	2024-04-11 11:37:02 -07:00
Nripesh Niketan	ffff671273	Update pre-commit hooks (#984 )	2024-04-11 07:27:53 -07:00
Awni Hannun	12d4507ee3	Explicit barriers with concurrent dispatch (#977 )	2024-04-10 21:45:31 -07:00
Awni Hannun	8580d997ff	Try a stack-based DFS for eval (#980 ) * rebase * nit * fix eval in vmap	2024-04-10 17:05:13 -07:00
Awni Hannun	99abb9eff4	Async eval (#972 )	2024-04-09 18:34:00 -07:00
Luca Arnaboldi	fffe072028	Implementation of mlx.random.multivariate_normal (#502 ) (#877 ) * Implementation of mlx.random.multivariate_normal (#502) * Update python/src/random.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/random.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/random.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Updated typo in docstring * Restricted multivariate_normal to float32 * Generic mean and variance shapes * Review edits * Update mlx/random.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/random.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/random.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/random.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Test for ndim of mean and cov * nits * smaller size for test * fix broadcasted sampling --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-04-09 13:50:12 -07:00
Abe Leininger	a1a31eed27	Add mx.meshgrid (#961 )	2024-04-09 11:43:08 -07:00
Awni Hannun	ae812350f9	use string (#976 )	2024-04-09 11:22:00 -07:00
Awni Hannun	42afe27e12	std and expm1 (#973 ) * std and expm1 * actually add expm1 * fix linux * fix vjp * relax tol for linux test * Add it to the compilable primitives --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-04-08 14:26:01 -07:00
Awni Hannun	76e63212ff	Enable bfloat scan (#974 ) * enable bfloat scan * fix tests	2024-04-08 12:29:19 -07:00
Awni Hannun	aac2f9fb61	Improve profiling with gpu tracing (#969 ) * improve profiling with gpu tracing * fix for linux * nit * doc fix * fix example	2024-04-07 21:47:43 -07:00
Awni Hannun	039da779d1	No quant reshape (#957 ) * precise option on cpu * remove print * remove reshape in quant matmul * no quant reshape	2024-04-04 11:52:12 -07:00
Awni Hannun	d88d2124b5	segfaut layer norm grad (#955 )	2024-04-04 10:59:15 -07:00
Awni Hannun	e142aaf8a1	Option for precise softmax (#953 ) * precise softmax * Add an equivalency check * Make the threadgroup memory definition fixed * precise cpu softmax * precise option on cpu * remove print --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-04-04 08:32:35 -07:00
Awni Hannun	741eb28443	fix a couple bugs (#952 )	2024-04-02 12:07:41 -07:00
Angelos Katharopoulos	1a87dc5ea8	Fix compile fusion for multi-output edge cases (#950 ) * Fix compile fusion for multi-output edge cases * Add a test for multi-output compile	2024-04-02 08:42:31 -07:00
Awni Hannun	2427fa171e	Fix cpu compile (#934 ) * fix one cpu bug, test for another * format hooks * simplify contiguity check for cpu compile * fix * add back donation * comment	2024-04-01 17:37:12 -07:00
Angelos Katharopoulos	110d9b149d	Layer norm grad fix donation bug (#941 ) * add layer norm grad test * Fix donation bug in layernorm vjp --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-04-01 06:15:50 -07:00
Angelos Katharopoulos	9cbff5ec1d	Fix typo in qmm check (#940 )	2024-03-31 19:15:44 -07:00
Awni Hannun	8915901966	Donation bug (#933 ) * donation * buf * fix bug in softmax * comment * remove print	2024-03-30 10:08:54 -07:00
Cheng	913b19329c	Add missing && when forwarding args (#925 ) Without the && args would be copied and perfect forwarding won't work.	2024-03-29 06:48:29 -07:00
Angelos Katharopoulos	5f9ba3019f	Fix qmm_t for unaligned cases (#923 )	2024-03-28 15:34:57 -07:00
Cheng	46caf0bef0	Remove unnecessary string copies (#891 ) 1. Use string_view instead of string when there is no need for copy. 2. Otherwise move string when possible.	2024-03-28 13:14:59 -07:00
Jack Mousseau	45f636e759	Add Metal debug option and capture functions (#707 ) * Add Metal debug option and capture functions * Add brief Metal debugger documentation * doc nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-28 09:40:31 -07:00
Cheng	a7b404ff53	Use uintptr_t instead of size_t to store funtion id (#916 ) Also does some small cleanup of the compile cache code.	2024-03-28 06:37:59 -07:00
Cheng	bab5386306	Make ops aware of rvalues: astype/as_strided/copy/full (#895 ) When compositing transforms lots of temporary of arrays will be created and passed to next primitive, and by making ops accepting args by value we can avoid lots of copies of temporary arrays.	2024-03-27 22:35:55 -07:00
Angelos Katharopoulos	aca7584635	Fix OOB read in qmv when non-divisible by blocksize (#917 )	2024-03-27 22:18:35 -07:00
Cheng	90dfa43ff1	Don't use make_unique to create shared_ptr (#902 ) The code compiled because shared_ptr's constructor actually accepts unique_ptr.	2024-03-27 06:13:29 -07:00
Awni Hannun	dc175f08d3	Fix race in multi-stream eval (#911 ) * maybe fix race * comment	2024-03-26 16:36:36 -07:00
Angelos Katharopoulos	29221fa238	Implement vjps for some primitives in the fast namespace (#883 ) * Implement rope vjp in terms of rope * RMSNormVJP primitive and kernel * Add LayerNormVJP primitive and kernel	2024-03-26 16:35:34 -07:00
Cheng	a789685c63	Remove duplicate defines of StreamOrDevice and is_big_endian (#892 )	2024-03-26 15:15:11 -07:00
Jagrit Digani	240d10699c	Implement negative padding in conv with slicing (#907 ) * Implement negative padding with slicing * Update mlx/ops.cpp Co-authored-by: Awni Hannun <awni@apple.com> --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-26 14:59:19 -07:00
Jagrit Digani	925014b661	Fix multiblock sort limits (#906 ) * Fix multiblock sort limits * Fix metal validation error	2024-03-26 14:00:00 -07:00
Angelos Katharopoulos	9948eddf11	Fix nan and improve speed for qvm (#903 )	2024-03-26 10:41:45 -07:00
Luca Arnaboldi	a3ee03da01	Fixing random.normal for half-precision dtype #642 (#904 ) * Fixing random.normal for half-precision dtype #642 * Update python/tests/test_random.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-03-26 09:58:27 -07:00
Cheng	28fcd2b519	Add missing && when forwarding args (#894 ) Without the && args would be copied and perfect forwarding won't work. Also add template utils to make sure the function only forwards array and not vector<array>.	2024-03-25 14:55:54 -07:00
Jack Mousseau	8e686764ac	Ensure shape dimensions are within supported integer range (#566 ) (#704 ) * Ensure shape dimensions are within supported integer range (#566) * fix build * fix rebase bug --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-25 13:29:45 -07:00
Daniel Strobusch	479051ce1c	add numeric type hierarchy and issubdtype as well as a set_dtype meth… (#427 ) * add numeric type hierarchy and issubdtype as well as a set_dtype method to nn.Module with predicate numeric type hierarchy and issubtype is compatible to the [numpy hierarchy](`220f0ab2c5/numpy/_core/numerictypes.py (L42)`). Closes #285. * nits in docs * unify type category checking * nits in docs * nits in docs * more docs nits * fix callable type --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-25 12:32:59 -07:00
Awni Hannun	be98f4ab6b	Reduce a little overhead (#871 ) * some small overhead improvements * use result_type in rms_norm * remove release force * fix + use non-vector version * revert compile change * fix ops * a little more overhead * a little more cleanup and overhead	2024-03-22 17:29:36 -07:00
Angelos Katharopoulos	6ee1112f30	Fix copy donation and add partial rope (#881 )	2024-03-22 17:28:26 -07:00
Cheng	9663c22fe9	Do not store iostream in shared_ptr (#872 ) There is no need to store iostream in shared_ptr, doing so adds the cost of a heap allocation.	2024-03-22 06:54:45 -07:00
Cheng	f0ae00da12	Reduce implicit copies in make_array (#874 ) 1. Move shapes into outputs instead of copying them. 2. Pass primitive by const ref as it is always copied into outputs, which removes a copy when calling make_array.	2024-03-22 06:29:16 -07:00
Angelos Katharopoulos	2225374060	Adds mx.fast.layer_norm (#870 )	2024-03-21 13:55:51 -07:00
nicolov	105d236889	Add vmap for SVD and inverse (#849 )	2024-03-21 13:18:27 -07:00
Awni Hannun	a54f06b16f	Fast RMS Norm (#862 ) * fast rmsnorm * no rms gpu * kernel * fix shared mem * looped rms and donation in softmax * Make the squaring in float32 to avoid underflow * Fix the default StreamOrDevice for rope and rms_norm in fast * nits --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-03-21 07:20:54 -07:00
Cheng	4650d94d98	Add missing && in eval (#864 ) Without the && args would be copied and perfect forwarding won't work. To avoid eval calling itself recursively, the vector version of eval is changed to take by value instead, which will save a copy of array when a rvalue is passed.	2024-03-21 06:15:48 -07:00
Jagrit Digani	a5681ebc52	Update set item (#861 ) * Update mlx_set_item to handle regular slices without expanding * Refactor ellipsis handling * Route mlx_set_item to slice_update where possible * Update mlx_scatter_args_slice * Don't route to gather if no array indices	2024-03-21 02:48:13 -07:00
Cheng	e849b3424a	Do not use static constexpr in header (#863 ) Doing so results in each compilation unit (.cpp file) having its own copy of the variable, while inline constexpr makes sure there is only one copy.	2024-03-20 21:28:05 -07:00
Jagrit Digani	b219d12a6b	Check edge case handling in row reduce med kernel (#858 )	2024-03-20 11:37:58 -07:00
Jagrit Digani	cec8661113	Add a SliceUpdate op and primitive (#850 ) * Enable copy to work with int64 strides * Fix uniform buffer indices or copy kernel arguments * Update utils.h * Remove manual unrolling of elem to loc loop * GPU copy updated to handle negative strides * Add slice update primitive	2024-03-20 10:39:25 -07:00
Cheng	73a8c090e0	Pass shape and inputs by value in array's constructor (#853 ) Since the shape and inputs are always saved as copy in ArrayDesc, we can unify array's constructors to just take the arguments by value. There are 2 cases: 1. When shape is a lvalue, it will be copied into array's constructor and then moved into ArrayDesc's member. So only 1 copy happens. 2. When shape is a rvalue, it will be moved into array's constructor and then moved into ArrayDesc's member. So no copy happens. So having 1 constructor that takes by value is equivalent to having 2 constructors that const reference and rvalue separately.	2024-03-20 07:54:30 -07:00
Awni Hannun	9a8ee00246	Switch to nanobind (#839 ) * mostly builds * most tests pass * fix circle build * add back buffer protocol * includes * fix for py38 * limit to cpu device * include * fix stubs * move signatures for docs * stubgen + docs fix * doc for compiled function, comments	2024-03-18 20:12:25 -07:00
Cheng	d39ed54f8e	Some C++ code are not needed (#841 ) 1. Anonymous namespace means internal linkage, static keyword is not needed. 2. The default constructor of std::shared_ptr initializes the pointer to nullptr, you don't need to explicitly set it.	2024-03-18 17:04:10 -07:00
Awni Hannun	16546c70d8	No reshape rope (#838 ) * no reshape rope * no reshape rope	2024-03-18 17:03:07 -07:00
nicolov	eaba55c9bf	Add matrix inversion primitive (#822 )	2024-03-15 06:34:36 -07:00
Awni Hannun	19ec023256	vmap matmul and admm (#836 )	2024-03-14 14:38:22 -07:00
Jagrit Digani	8dfc376c00	Strided reduce specialization for small reductions (#826 ) * Add small column / general reduction specialization	2024-03-14 09:16:53 -07:00
Angelos Katharopoulos	1efee9db09	Add types and order in kernel name (#831 )	2024-03-13 20:34:06 -07:00
Awni Hannun	43abc402d8	route to fallback (#828 )	2024-03-13 19:56:04 -07:00
Angelos Katharopoulos	3f8b1668c4	Make reshape faster for row_contiguous cases (#829 )	2024-03-13 16:22:03 -07:00
Angelos Katharopoulos	76c919b4ec	NumberOfElements for shapeless compile and vmap fixes (#802 )	2024-03-13 10:34:14 -07:00
Angelos Katharopoulos	29d0c10ee5	Reshape improvement (#818 )	2024-03-12 17:54:31 -07:00
Jagrit Digani	5ad133f8bb	No copy gems (#801 ) * Enable collapsing batch dims in gemm * Update gemm to only make copies when neither of the last 2 axes are contiguous * Update addmm to support gemv shapes * Update addmm to support irregular batch strides * Update tests	2024-03-12 13:13:41 -07:00
nicolov	d0c544a868	Add SVD primitive (#809 ) Add SVD op using Accelerate's LAPACK following https://developer.apple.com/documentation/accelerate/ compressing_an_image_using_linear_algebra Co-authored-by: Nicolo Valigi <nvaligi@apple.com>	2024-03-12 12:30:11 -07:00
Awni Hannun	8b7532b9ab	fix scatter (#821 )	2024-03-12 11:42:07 -07:00
Awni Hannun	0e95b64942	Fix bug in tape order during simplify (#816 ) * fix bug in tape order during simplify * properly fix compile * last bug	2024-03-11 17:29:05 -07:00
nicolov	0ae22b915b	Remove code duplication in reduce ops (#793 ) * Remove code duplication in reduce ops * Remove the unnecessary lambda --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-03-11 10:57:07 -07:00
Awni Hannun	7c441600fe	Compile stride bug (#812 ) * fix compile stride bug * revert sdpa fix * fix cpu * fix bug with simplifying outputs	2024-03-11 06:31:31 -07:00
Awni Hannun	a4d290adb9	Remove depth traversal (#813 ) * no depth traversal * counter outside loop	2024-03-09 20:21:32 -08:00
Jagrit Digani	ec8a4864fa	Fix SDPA kernel bug on Mac OS 13.3 SDK (#805 ) * Move sdpa kernel to allocate tgp mem statically and allow macOS 13.3 SDK builds * Style	2024-03-07 10:18:09 -08:00
Awni Hannun	f512b905c7	Minimum xcode / sdk (#800 ) * minimum xcode /sdk * try multiple xcode versions in CI * update python * metal validation for python tests	2024-03-07 08:19:43 -08:00
Awni Hannun	afd5274049	route to fallback for bfloat (#794 )	2024-03-06 15:39:12 -08:00
Awni Hannun	1074674e32	Add a maximum graph depth (#797 ) * add a maximum graph depth * remember how to use C++	2024-03-06 15:39:00 -08:00
Angelos Katharopoulos	e39bebe13e	Fix reshaping of empty arrays (#791 )	2024-03-05 23:33:22 -08:00
Angelos Katharopoulos	14b4e51a7c	Improved quantized matrix vector product (#786 )	2024-03-05 17:32:19 -08:00
Awni Hannun	cbcf44a4ca	Some fixes in cache / thread safety (#777 ) * some fixes in cache / thread safety * speed up no cache case * fix opt test * optimizer docs * otpimizer docs * fix adafactor * fix adafactor	2024-03-05 13:30:50 -08:00
Brian Keene	0787724c44	Fast Inference SDPA op (#735 ) * Fast Inference SDPA op Implements metal shaders for: o = mx.fast_inference_sdpa(queries, keys, values, scale, mask) Supports fp16, fp32 dtypes; assumes d_k = 128. Generic op support / prompt encoding supported via mlx primitives. Metal implementation is for the inference use case only. Majority of performance benefits appears to results from GQA & reduced bandwidth requirements; there is approximate performance parity for the MHA use case (from some measurements on M3 Max). * Flush shared memory to zero before unprotected reads for (scores @ values) * Move to fast:: namespace, address reviewer comments ... also attempt to revert formatter auto-change for files not relevant to this change * Shared memory flush to top of kernel * Resolve compiler warnings * Update python/src/fast.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/fast.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/fast.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/fast.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update docstring per PR feedback * Softmax in higher precision, ... * route to fallback for more use cases - batch size > 1, head_dim other than 128, etc. * Address linux build failure * Address other reviewer comments * Remove extraneous eval_cpu function per review --------- Co-authored-by: Atila Orhon <64497909+atiorh@users.noreply.github.com> Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Co-authored-by: atila <atiorh@icloud.com>	2024-03-04 21:06:11 -08:00
Awni Hannun	7b463ffb07	Ios compile (#784 ) * try to fix build for ios * skip cpu compile * fix namespace * fix namespace * Use CMake for platform specific cpu compile --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-03-04 20:02:26 -08:00
Jagrit Digani	6686e61ca4	Reduce update (#783 ) * Split reduction files to reduce compile times * Add small and medium axis size specializations for row reductions * Add non-row-reduction options for small and med kernels	2024-03-04 19:09:51 -08:00
Awni Hannun	5121f028d9	nice tensordot for mlx c (#782 )	2024-03-04 09:51:02 -08:00
Awni Hannun	bc06cb9ff6	Pickle + dtype fix for numpy conversion (#763 ) * pickle + dtype fix for numpy conversion * fix getattribute on Module base * remove unused function * fix tests * add topk to ops * fix doc	2024-03-02 06:09:29 -08:00
Angelos Katharopoulos	8e281c76c3	Fix the top-k op (#768 )	2024-03-01 22:08:43 -08:00
Awni Hannun	d5964a2710	bindings for memory info (#761 ) * bindings for memory info * update api * keep cache low if requested * fix default * nit in ops error	2024-03-01 19:51:58 -08:00
Jagrit Digani	776c3d226d	Convolution update (#651 ) * Init steel conv and update Conv primitive * Update slow CPU implementation to support flipping and input dilation winograd conv routing Co-authored-by: Awni Hannun <awni@apple.com>	2024-02-28 20:11:16 -08:00
Awni Hannun	f5f18b704f	fix temporary bug (#752 )	2024-02-27 17:44:39 -08:00
Awni Hannun	56ba3ec40e	fix cpu compile on older OS (#747 )	2024-02-26 22:20:53 -08:00
Hinrik Snær Guðmundsson	08226ab491	added atleast args input support (#710 ) added atleast list(array) input support * function overloading implemented * Refactoring * fixed formatting * removed pos_only	2024-02-26 11:17:59 -08:00
Awni Hannun	e6418781ab	Fix logsumexp edge case (#740 ) * fix logsumexp * fix inf constant * also fix power grad * fix ternary dispatch	2024-02-25 08:39:55 -08:00
Awni Hannun	ac02cf33bd	Fix some issues using MLX in C++ (#739 ) * fix preamble build * fix some issues with using MLX as a dep in C++	2024-02-24 22:20:57 -08:00
Noah Farr	d729a1991b	Fix arange with inf step (#686 ) * Fix case for step=inf in arange and add inf check for start/stop * Add test cases for arange * Update ops.cpp to include climits header * Fix arange * Fix formatting * Refactor * Add missing include	2024-02-23 06:18:15 -08:00
Rifur13	126c9869c8	Implement the 'where' primitive for conditional selection (#664 )	2024-02-22 15:10:48 -08:00
Jagrit Digani	884b4ed43b	Fix threadgroup memory in arg reduce (#723 )	2024-02-21 19:42:16 -08:00
Vijay Krish	972d9a3aea	Up to 10x faster scatter. (#709 ) * Faster scatter. Add specialization for 1-d index tensors. * Address review comments. - Check for row contiguity of index, update tensors instead of checking strides. - Add support for 1d specialization with col contiguous update tensor, along with a test. * Nit1 Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Nit2 Co-authored-by: Awni Hannun <awni.hannun@gmail.com> --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-02-21 11:09:30 -08:00
Awni Hannun	5798256fcf	Shapeless compilation for some graphs (#687 ) * shapeless compilation for some graphs * update compile benchmark * default compile a few activations * buffer donation * bugfix * shapeless fix * update tests to work for cpu and gpu fusion * test kwargs * add kwargs to compile * Recompile when python arguments change * no compile for tanh * some constant tests --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-02-19 21:43:54 -08:00
Hinrik Snær Guðmundsson	f883fcede0	Added support for atleast_1d, atleast_2d, atleast_3d (#694 )	2024-02-19 09:40:52 -08:00
Awni Hannun	1a4f4c5ea6	Refactor CPU compile preamble (#708 ) * refactor cpu preamble * fix include order * fix some issues' * fixes for linux * try to fix includes * add back warning suppression * more linux fixes	2024-02-19 06:12:53 -08:00
Jack Mousseau	0925af43b0	Remove unused variables (#706 )	2024-02-18 12:50:10 -08:00
Awni Hannun	dc937b8ed3	CPU compile (#691 ) * build and load shared object for cpu compile * nits * cpu compile tests pass * cpu compile tests pass * fix preamble for g++ * donation * fix gpu buffer donation * reuse prebuilt libraries * faster contiguity conditoins * fix test * rid compiler warning * fast erf * Fix float16 for compile and add more types to cpu compile * Remove a forgotten comment * use cached libs * nits --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-02-17 06:54:32 -08:00
Awni Hannun	c3965fc5ee	Separate fast ops and primitives (#699 )	2024-02-16 19:16:39 -08:00
toji	85143fecdd	improved error msg for invalid axis(`mx.split`) (#685 ) * improved error msg for invalid axis(`mx.split`) * Apply suggestions from code review Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * fixed formatting issue --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-02-15 07:25:38 -08:00
Diogo	35431a4ac8	Adds device context manager (#679 )	2024-02-14 14:14:58 -08:00
Awni Hannun	ccf1645995	Custom primitive + RoPE fat op (#676 ) * extensions start * rope custom op * fix build * docs + rope benchmark * fix test * Add a Metal kernel for RoPE * Fix position of traditional * transform tests * Move rope computation to float and fix tests * Fix the test and a typo * change to fast * fix no metal build --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-02-14 14:04:25 -08:00
Jagrit Digani	1a48713d32	Update gather and scatter to not use Argument Encoder (#683 ) * Replace argument encoder usage for gather and scatter * Use constant address space for shapes and strides * Split gather and scatter to improve compile times * Enable the GPU tests * Update the CI config * Fix scatter dispatch for scalar indices * Remove arg encoder utils --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-02-14 13:42:13 -08:00
Awni Hannun	1eb04aa23f	Fix empty array construction in cpp (#684 )	2024-02-13 23:34:17 -08:00
Noah Farr	0c65517e91	Return empty array when repeats is 0 in mx.repeat (#681 ) * Return empty array when repeats is 0 * Add test case for repeats = 0	2024-02-13 17:49:31 -08:00
Vijay Krish	2fdc2462c3	Faster gather and scatter. (#682 ) Reduce unnecessary integer ops, especially since there kernels are integer bound. Increase number of iterations for benchmarks for better smoothing. Github Issue #506 Co-authored-by: Vijay Krishnamoorthy <vijay_krish@apple.com>	2024-02-13 17:47:41 -08:00
Angelos Katharopoulos	40c108766b	Quantized matmul fix (#677 ) * Fix qmv for small or unaligned matrices * Fix qmm	2024-02-12 18:54:21 -08:00
Awni Hannun	3756381358	Faster bfloat quantized mat-vec and vec-mat (#663 )	2024-02-11 21:53:16 -08:00
Awni Hannun	d12573daa6	quote file name (#670 )	2024-02-11 10:33:30 -08:00
Vijay Krish	06072601ce	Scatter optimization : Eliminate 64b integer divide. (#662 ) Launch 2D grid to eliminate divide and mod in device code, since 64b integer division is very expensive. Github Issue #506 Co-authored-by: Vijay Krishnamoorthy <vijay_krish@apple.com>	2024-02-10 08:49:51 -08:00
Awni Hannun	7f3f8d8f8d	Fix the softmax fix (#661 )	2024-02-09 17:02:13 -08:00
Awni Hannun	b96be943dc	bug fix (#658 )	2024-02-09 16:50:45 -08:00
Abdussamet Türker	b670485185	Remainder negative numerator bug fixed (#641 ) Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-02-09 16:49:14 -08:00
Diogo	b57bd0488d	Metadata support for safetensors (#639 ) * metadata support for safetensors * aliases making it alittle more readable * addressing comments * python binding tests	2024-02-08 19:33:15 -08:00
Awni Hannun	1b97b2958b	Compile with capture (#629 ) * Simple kernel generation * Remove the generate kernel from graph_utils * fix multi-output with compile * fuse with stopgrad * v1 input, output capture in compile * cleanup tree update with visitor update * nit * remove todo * state for model, optional explicit init and more pure optimizer steps * move learning rate to state * add lr to opt state, some fixes in capture * fix optim * update tuple of containers as well * fix stream for compiled output * rng state for compile * nit * updates and comments --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-02-07 17:29:22 -08:00
Angelos Katharopoulos	28eac18571	Kernel generation (#614 ) Generate reusable element-wise kernels given a computation graph.	2024-02-07 13:15:59 -08:00
Noah Farr	5fd11c347d	Add loc and scale to random.normal (#638 ) * Add loc and scale to random.normal * Add tests for loc and scale for random.normal * Run pre-commit hooks * Fix code review	2024-02-07 11:49:59 -08:00
Awni Hannun	146bd69470	Skip compile when transforming (#635 ) * skip compile when transforming * simplify message	2024-02-05 21:28:37 -08:00
Jagrit Digani	316ff490b3	Remove masks from BlockLoader and clear out load case for invalid thread (#634 )	2024-02-05 16:00:17 -08:00
Awni Hannun	d40a04f8dc	minor fixes (#631 ) * minor fixes * var with ddof >= nelements	2024-02-05 13:27:49 -08:00
Awni Hannun	d75ae52ecd	Compile primitive (#571 ) * Compiled primitive with basic binary, unary graph-level fusion	2024-02-05 06:51:22 -08:00
Avikant Srivastava	31fea3758e	feat: enhancement of the error message for mlx.core.mean (#608 ) * add error message	2024-02-05 01:21:49 -08:00
Awni Hannun	e319383ef9	Faster gather (#626 ) * faster gather * update copyright	2024-02-04 17:25:44 -08:00
David Koski	ebfd3618b0	fixes for building and running on iOS (#619 ) * fixes for building and running on iOS * per suggestion just use Accelerate	2024-02-04 12:29:17 -08:00
Avikant Srivastava	11a9fd40f0	fix: handle linspace function when num is 1 (#602 ) * fix: handle linspace function when num is 1 * add comment * fix test case * remove breakpoint	2024-02-04 11:03:49 -08:00
Awni Hannun	95b5fb8245	minor changes (#613 )	2024-02-02 11:48:35 -08:00
Awni Hannun	cb6156d35d	Fix eval in trace bugs (#612 ) * Fix eval in trace bugs * comment nit	2024-02-02 09:57:12 -08:00
Piotr Rybiec	506d43035c	typo fix (#607 )	2024-02-01 17:39:55 -08:00
Awni Hannun	e88e474fd1	Reduce vmap + some fixes (#601 )	2024-02-01 11:30:28 -08:00
Vijay Krish	fcc5ac1c64	Add GPU support for uint64/int64 reductions (#569 )	2024-01-31 11:18:04 -08:00
Angelos Katharopoulos	199aebcf77	Change the variance computation (#319 )	2024-01-30 19:28:56 -08:00
Angelos Katharopoulos	0de5988f92	Custom VJP and checkpointing (#541 ) * Implement custom_vjp and checkpointing * Add a dependency management primitive * Change the eval order to deep branches first * Add graph depth tracking to the array	2024-01-30 16:04:45 -08:00
Jagrit Digani	375446453e	Update Compute Pipeline Creation API (#581 ) * Add option to specialize metal functions on function constants * Update Compute Pipeline Creation API * Add options to make libraries from source and stitching * Update function specialization name options	2024-01-30 15:42:36 -08:00
Angelos Katharopoulos	1895d34c20	Fix log1p with inf inputs (#592 )	2024-01-30 14:02:50 -08:00
Jacket	3f7aba8498	Implement diagonal operator (#562 ) * Implement diagonal operator This implements mx.diagonal in operator level, inspired by @ManishAradwad. * added `mx.diag` with tests * corrected few things * nits in bindings * updates to diag --------- Co-authored-by: ManishAradwad <manisharadwad@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-30 09:45:48 -08:00
Angelos Katharopoulos	65d0b8df9f	Fix binary op dispatch (#584 )	2024-01-29 19:36:17 -08:00
Awni Hannun	3c2f192345	Propagate nans in binary ops (#579 ) * propagate nans in binary ops * handle empty matmul * cpu minimum/maximum propagate nan * benchmark maximum * add min as well * throw on negative indices with full * verbose on linux * fix matmul for zero K	2024-01-29 11:19:38 -08:00
Awni Hannun	8993382aaa	Buffer Donation (#519 ) * buffer donation * fix to move shared pointer * format * gpu in place for copy and binary * revert ops test * cpu in place * a little cleanup * remove useless bench	2024-01-26 16:30:33 -08:00
Awni Hannun	07f35c9d8a	Fix a few issues: docs for flatten, erf, dequantize validation (#560 ) * doc flatten * erf doc * check values for dequantize * format	2024-01-26 15:16:46 -08:00
Jagrit Digani	bf17ab5002	Add more checks and clearer error messages to conv operations (#563 ) * Add more checks and clearer error messages to conv operations	2024-01-26 15:13:26 -08:00
Awni Hannun	8fa6b322b9	Compile front-end (#476 ) * fix tests for linux * make a move on compile * basic compile scaffold works * compile binding * clean * fix * fix grad, more tests * basic python tests * fix segfault on python exit * compile works with python closures * fix test * fix python globals bug, and erase * simplify * more cpp tests * bug fix with move function and compile at exit * simplify inputs also * enable and disable compiler * remove simplify * simplify tests use compile now * fix multi-output with compile * clear output tree from cache when function goes out of scope * ../python/src/transforms.cpp * remove closure capture * comments	2024-01-26 13:45:30 -08:00
taher	077c1ee64a	QR factorization (#310 ) * add qr factorization --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-26 09:27:31 -08:00
Rifur13	2463496471	[Fix] mx.allclose bug with infinite values (#539 ) * Added isclose op and fixed comparison with inf values * Added 'equal_nan' to match numpy * format * Add test * Update python/src/ops.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/ops.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Addressed CR comments * Update python/src/ops.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * nits --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-25 20:47:06 -08:00
Awni Hannun	f27ec5e097	More helpful error message in vjp transform + concate bug (#543 ) * more helpful message in vjp transform * fix concatenate on mismatch dims * typo * typo	2024-01-24 09:58:33 -08:00
Awni Hannun	f30e63353a	Minor updates to address a few issues (#537 ) * docs on arg indices return type * arange with nan * undo isort	2024-01-23 22:24:41 -08:00
Juarez Bochi	4fe2fa2a64	GGUF: Avoid dequantization when format is compatible (#426 ) * GGUF: Don't dequantize q4_1 * Fix weight order. First in low bits * Add unpacking for q4_0 * Don't dequantize q8_0 * rebase quants and split file * don't quantize every weight * reapply patch * error handling --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-23 15:43:57 -08:00
Jagrit Digani	6d3bee3364	Fix oob reads in gemv kernel (#523 )	2024-01-22 12:06:04 -08:00
Awni Hannun	7a34e46677	Quantize with groups of 32 (#511 ) * allow quantize with group sizes of 32 * missing cpu dispatch * remove print * Fix qvm for group_size 32 --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-01-21 06:19:05 -08:00
Awni Hannun	b207c2c86b	Power VJP fix for 0 (#505 )	2024-01-20 01:17:40 -08:00
Juarez Bochi	ddf50113c5	GGUF: Load and save metadata (#446 ) * gguf metadata --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-19 14:06:05 -08:00
Awni Hannun	c4ec836523	fix isinf for integer types (#494 )	2024-01-19 05:31:10 -08:00
Awni Hannun	3d99a8d31d	Fix format / build (#489 )	2024-01-18 10:01:59 -08:00
Ethan	a749a91c75	Support disable metal buffer cache to prevent performance degradation caused by large memory caching (#390 ) * support disable metal buffer cache, due to large unused memory buffered when llm generated long context tokens * Run format and add "cache_enabled" feature tests	2024-01-18 08:33:34 -08:00
toji	49a52610b7	Added formatter structure and a boolean value formatter (#354 ) * added formatter structure and a boolean value formatter --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-18 07:49:41 -08:00
Angelos Katharopoulos	9c111f176d	Fix split optimization for array iterator (#484 )	2024-01-18 05:50:25 -08:00
Angelos Katharopoulos	90c234b7ac	Fix round to round half-cases to even (#482 )	2024-01-17 15:27:23 -08:00
Angelos Katharopoulos	135fd796d2	Fix detach for multi-output primitives (#480 )	2024-01-17 14:08:07 -08:00
Jagrit Digani	78102a47ad	Update GEMM (#424 ) * Organize and collect metal subroutine templates and elements in `metal/kernels/steel/` * Update gemm elements for better performance * Add split-K specialization for gemm * Add `addmm` primitive, op and bindings for fused matmul and bias addition * Update tests and benchmarks as needed	2024-01-17 12:42:39 -08:00
Diogo	556cdf0e06	Resolves build issues with the extension example (#419 ) * resolved extension build issues and added test to ci * missing gguflib * rebased * force mlx install from fix branch * linux build issue * point to git install and comment out ci tests	2024-01-17 12:07:05 -08:00
Awni Hannun	275db7221a	Command buffer reports errors (#479 ) * command buffer reports errors * typo * simplify	2024-01-17 11:53:30 -08:00
Awni Hannun	a2bf7693dd	Primitive's VJP takes outputs as input (#475 ) Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-01-16 19:03:53 -08:00
Angelos Katharopoulos	d8fabaa12b	Split multi output (#461 ) * Multi-output split primitive * Add the multi-output split to the ArrayIterator * Add some grad tests for split	2024-01-16 13:33:55 -08:00
Avikant Srivastava	4e290d282f	feat: add time based seed to random.h (#457 ) * random seed from time * fix: chrono * refactor: snake case	2024-01-16 07:32:28 -08:00
Yashraj Singh	e72458a3fa	implemented isposinf and isneginf in one PR (#470 ) * ran precommit * updated docs	2024-01-16 06:48:07 -08:00
Awni Hannun	a2ffea683a	Fix eye for larger matrices (#463 ) * fix eye * fix scatter for <32bit (non native atomic) types * fix int overflow	2024-01-16 00:51:24 -08:00
Angelos Katharopoulos	c15fe3e61b	Allow arbitrary first dimension in quantization kernels. (#458 ) * Allow arbitrary first dim on qmm_t and qmv * Allow arbitrary first dim on qmm and qvm * Specialized aligned vs unaligned case * Add more checks for valid quantizations	2024-01-16 00:46:21 -08:00
Tristan Bilot	f44c132f4a	Add scatter_min VJP (#462 )	2024-01-16 00:37:40 -08:00
Matthew Ernst	92a2fdd577	Adds isinf (#445 ) * adds isinf Signed-off-by: matthewfernst <matthew.f.ernst@gmail.com> * use stream + nits * typo --------- Signed-off-by: matthewfernst <matthew.f.ernst@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-15 19:50:44 -08:00
Tristan Bilot	6022d4129e	scatter_max vjp + bindings + tests (#431 ) Co-authored-by: DjamelMesbah <djamel.mesbah@adservio.fr>	2024-01-14 14:12:15 -08:00
Awni Hannun	4bc446be08	Use a dummy primitive to only sync with one output (#453 ) * Use a dummy primitive to only sync with one output * Fix test and choose stream with slight care	2024-01-14 14:09:40 -08:00
Awni Hannun	41cc7bdfdb	Fix stub generation, change graph exporting for arrows to go to outputs (#455 )	2024-01-14 14:06:16 -08:00
Awni Hannun	6e81c3e164	Sync only with outputs we need to sync with (#447 )	2024-01-13 01:47:25 -08:00
Diogo	2e29d0815b	Add tile op (#438 )	2024-01-12 23:03:16 -08:00
Ayush Shridhar	1416e7b664	Add isnan (#423 )	2024-01-12 11:16:48 -08:00
Angelos Katharopoulos	006d01ba42	Fix packaging of gguflib (#435 )	2024-01-11 13:56:03 -08:00
Awni Hannun	c9934fe8a4	Metal validation (#432 ) * tests clear metal validation * add cpp test with metal validation to circleci * nit	2024-01-11 11:57:24 -08:00
Awni Hannun	3b4f066dac	Correct types for vjp + tests (#418 ) * correct types for vjp + tests * fix build + comment	2024-01-10 13:32:37 -08:00
Juarez Bochi	b7f905787e	GGUF support (#350 ) * Initial GGUF support for tensor fields. --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-10 13:22:48 -08:00
Angelos Katharopoulos	961435a243	Scatter vjp (#394 ) * Add a first scatter vjp * Implement the scatter_add vjp * Add array.at to implement user friendly scatters	2024-01-09 13:36:51 -08:00
Awni Hannun	f099ebe535	Multi output primitives (#330 ) * Multi-output primitives --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-01-08 16:39:08 -08:00
Nripesh Niketan	73321b8097	feat: add logicalAnd and logicalOR (#386 ) * feat: add logicalAnd and logicalOR * run pre-commit * Refactor logical_and and logical_or functions * Add acknowledgement * Add logical AND and logical OR operators * Refactor logical_and and logical_or functions * Add support for logical operators on bool arrays * Update mlx/ops.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update mlx/ops.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Add logical AND and OR operators for arrays and scalars * Refactor vjp and jvp methods in primitives.cpp * Add overloaded operators for logical AND and OR * format --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-08 07:00:05 -08:00
Angelos Katharopoulos	a611b0bc82	Removes the `retain_graph` flag (#385 ) * Adds global tracing flag * Removes retain_graph in favor of is_tracer	2024-01-07 15:16:51 -08:00
Diogo	449b43762e	Add inner / outer op (#348 ) * inner / outer impl * python tests * ops list and ack * updated descriptions * use test helper * removed dtype check and flatten outer to 1-D * updated docs * just use the reshape to flatten	2024-01-07 09:01:09 -08:00
Awni Hannun	c6d2878c1a	safely divide for 0 size inputs (#388 )	2024-01-07 00:19:54 -08:00
Awni Hannun	b34bf5d52b	fix saving for non-contiguous arrays (#389 )	2024-01-06 12:44:02 -08:00
Angelos Katharopoulos	608bd43604	Move the matmul type check in the op (#384 )	2024-01-05 19:10:13 -08:00
mutexuan	d8f41a5c0f	support python mlx.array creation from list of mlx.array's (#325 ) * support python mlx.array creation from list of mlx.array's * include bfloat16 in UT * refactor so that sub array made of all python primitive types gets initialized by fill_vector * address PR comment: arr.shape().size() -> arr.ndim() * address PR comment: get back Dtype constness and let stack to handle type promotions automatically	2024-01-04 18:53:33 -08:00
Awni Hannun	b9e415d19c	bump pre commit and fix format (#373 )	2024-01-04 16:28:52 -08:00
davidkoski	c82a8cc526	move all ObjC (via metal-cpp) interaction until post static initializers (#370 ) * move all ObjC (via metal-cpp) interaction until post static initializers - metal-cpp relies on static initializers to cache class and selector pointers - code in mlx was using metal-cpp to set up NSAutoreleasePools during its own static init time - but this code was silently failing as the class and selector pointers from metal-cpp were still nil - defer the creation of NSAutoreleasePools until after static init time - ensure that we have coverage where autorelease pools are needed * Update device.cpp remove commented code * Update device.cpp remove commented out code * Update scheduler.h update comment * per discussion use the pool inside the task() -- this will be metal only, not needed for cpu * Update allocator.cpp move pool to release/alloc area	2024-01-04 16:12:00 -08:00
Angelos Katharopoulos	e7f5059fe4	Support for quantized matmul with w and w^T (#349 ) * Add the metal qvm implementation * Add qmm_n * Add gradient wrt to input for quantized_matmul	2024-01-03 14:22:36 -08:00
Diogo	0782a4573a	Add Tensordot op (#344 )	2024-01-02 17:15:00 -08:00
Awni Hannun	99c80a2c8b	Memory allocation (#292 ) * try alternative gc * try no cache * add forced swap * remove cache for now * add cache back * change fit crtieria * remove unused function * nit in comment * tune / fix allocation * increase block limit to original	2024-01-02 11:59:19 -08:00
Josh Soref	44c1ce5e6a	Spelling (#342 ) * spelling: accumulates Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: across Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: additional Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: against Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: among Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: array Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: at least Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: available Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: axes Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: basically Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: bfloat Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: bounds Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: broadcast Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: buffer Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: class Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: coefficients Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: collision Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: combinations Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: committing Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: computation Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: consider Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: constructing Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: conversions Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: correctly Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: corresponding Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: declaration Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: default Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: dependency Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: destination Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: destructor Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: dimensions Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: divided Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: element-wise Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: elements Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: endianness Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: equivalent Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: explicitly Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: github Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: indices Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: irregularly Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: memory Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: metallib Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: negative Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: notable Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: optional Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: otherwise Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: overridden Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: partially Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: partition Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: perform Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: perturbations Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: positively Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: primitive Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: repeat Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: repeats Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: respect Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: respectively Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: result Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: rounding Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: separate Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: skipping Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: structure Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: the Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: transpose Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: unnecessary Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: unneeded Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: unsupported Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> --------- Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>	2024-01-01 21:08:17 -08:00
Angelos Katharopoulos	a020a2d49d	Improve repeat using broadcasting and reshape (#318 )	2023-12-29 21:40:20 -08:00
Bahaa	ff2b58e299	Add support for repeat (#278 ) * add repeat function * fix styling * optimizing repeat * fixed minor issues * not sure why that folder is there xD * fixed now for sure * test repeat not repeat test * Fixed --------- Co-authored-by: Bahaa Eddin tabbakha <bahaa@Bahaas-MacBook-Pro.local>	2023-12-27 13:11:38 -08:00
Diogo	1f6ab6a556	Safetensor support (#215 ) Co-authored-by: Awni Hannun <awni@apple.com>	2023-12-27 02:06:55 -08:00
Gabrijel Boduljak	6b0d30bb85	linalg.norm (#187 ) * implemented vector_norm in cpp added linalg to mlx * implemented vector_norm python binding * renamed vector_norm to norm, implemented norm without provided ord * completed the implementation of the norm * added tests * removed unused import in linalg.cpp * updated python bindings * added some tests for python bindings * handling inf, -inf as numpy does, more extensive tests of compatibility with numpy * added better docs and examples * refactored mlx.linalg.norm bindings * reused existing util for implementation of linalg.norm * more tests * fixed a bug with no ord and axis provided * removed unused imports * some style and API consistency updates to linalg norm * remove unused includes * fix python tests * fixed a bug with frobenius norm of a complex-valued matrix * complex for vector too --------- Co-authored-by: Awni Hannun <awni@apple.com>	2023-12-26 19:42:04 -08:00
Angelos Katharopoulos	9e6b8c9f48	Refactor the reduction kernels (#277 )	2023-12-24 14:47:57 -08:00

... 2 3 4 5 6 ...

389 Commits