zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-08-13 04:36:46 +08:00

Author	SHA1	Message	Date
Awni Hannun	5f7d19d1f5	MPI ops in GPU stream for faster comms (#1356 )	2024-08-26 15:12:50 -07:00
Alex Barron	d1183821a7	int() and float() for mx.array (#1360 )	2024-08-25 20:41:44 -07:00
Angelos Katharopoulos	8081df79be	Fix boolean all reduce bug (#1355 )	2024-08-24 10:09:32 -07:00
Alex Barron	b96e105244	Add `grid_sample` example to `metal_kernel` docs (#1352 ) * Add `zero_outputs` and `atomic_outputs` options to `metal_kernel` * add grid sample to docs * zero_outputs -> init_value * add missing header for linux	2024-08-23 18:24:16 -07:00
Awni Hannun	3b4d5484c7	Bump extension MLX version (#1350 ) * Bump extension MLX version * fix some docs nits	2024-08-23 12:38:34 -07:00
Angelos Katharopoulos	b57a52813b	Further reduction tuning (#1349 ) * More reduction tuning * Forgotten pdb * Small column long row specialization	2024-08-23 10:35:25 -07:00
Alex Barron	da8deb2b62	fix bug with multiple attributes (#1348 ) Co-authored-by: Alex Barron <abarron22@apple.com>	2024-08-23 10:06:15 -07:00
Awni Hannun	98b6ce3460	Refactor reductions and fix scatter atomics for large sizes (#1300 ) Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-08-22 16:03:31 -07:00
Alex Barron	0fd2a1f4b0	Custom Metal Kernels from Python (#1325 ) * start * simple kernels working * restructure * inverse example working * docs + fixes * missing file * fix imports * address comments * add docs + fix test * Review comments + refactor to a single function * update docs * remove hashing * fix contig bug in test * back to a class * trailing whitespace * fix tests * match c++ and python apis * add link + make args kw_only	2024-08-22 13:46:29 -07:00
Awni Hannun	d40e76809f	Fix rope (#1340 ) * add test * fix rope * fix test	2024-08-20 17:37:52 -07:00
Awni Hannun	bb1b76d9dc	RoPE with frequencies as optional input (#1337 ) * start rope with freq input * rope with frequencies * nits * fix bug * fix bug + test * cleanup * optional base	2024-08-19 18:30:50 -07:00
Awni Hannun	ae5b5cabfd	Fix optimizer reloading from checkpoint (#1329 ) * fix optimizer reloading from checkpoint * comment	2024-08-15 07:33:23 -07:00
Alex Barron	99bb7d3a58	GPU mx.sign for complex64 (#1326 )	2024-08-14 07:54:53 -07:00
Awni Hannun	63ae767232	fix transformer (#1327 )	2024-08-13 16:04:26 -07:00
Awni Hannun	eaaea02010	Add `isfinite` (#1318 ) * isfinite * remove reduce test since fix is not complete	2024-08-13 14:49:28 -07:00
Bhargav Yagnik	a098bc92e0	Fix: Preserve input dtype in Dropout layer output (#1323 ) * Fix: Preserve input dtype in Dropout layer output - Modified Dropout implementation to ensure that the output dtype matches the input dtype. - This resolves the issue #1321 * Update test cases in test_nn.py - Revised test cases to align with updated dropout code - Fixed assertion method: replaced self.assertTrue with self.assertEqual for accurate comparisons in test_nn.py -> test_rope, test_alibi and test_dropout, * updated dropout.py	2024-08-13 11:54:21 -07:00
Brian Keene	19fb69e2ed	Add memory_efficient_threshold kwarg to sdpa kernel (#1319 ) Allows opt-in to memory efficient GPU shader at proscribed sequence length. Otherwise, utilizes aggregate MLX primitives for best latency.	2024-08-12 12:57:09 -07:00
Awni Hannun	9231617eb3	Move to nanobind v2 (#1316 )	2024-08-08 17:17:46 -07:00
Alex Barron	32668a7317	CPU mx.linalg.cholesky_inverse and mx.linalg.tri_inv (#1307 ) * add cholesky inv + tri inv * always run tri_inv on cpu * consistent naming	2024-08-08 15:18:02 -07:00
Angelos Katharopoulos	780c197f95	Fix test tolerance and patch bump (#1315 )	2024-08-08 14:51:09 -07:00
Alex Barron	635ccd9e25	Add "edge" mode to mx.pad (#1309 ) * Add edge padding mode * fix pad in pooling * string arg instead of enum	2024-08-06 11:23:10 -07:00
nicolov	8c9f0278b9	Add vmap to scatter (#1200 ) * Add vmap to scatter * updates * vmap updates + a few more tests * bug fix --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-08-05 20:12:27 -07:00
Awni Hannun	58d0e199e1	add bfloat conv for windograd (#1306 ) * add bfloat conv for windograd * accumulate in fp32 * accumulate in fp32 * accumulate in bf16	2024-08-05 15:51:13 -07:00
Awni Hannun	10b5835501	fix creating array from bf16 tensors in jax / torch (#1305 )	2024-08-01 16:20:51 -07:00
Awni Hannun	6c8dd307eb	faster group norm (#1304 )	2024-08-01 12:49:23 -07:00
Awni Hannun	40b6d67333	Fixes for large arrays with a few ops (#1299 ) * fixes for large arrays with a few ops * fix bug * fix all of copy	2024-07-30 17:18:39 -07:00
Alex Barron	c52d1600f0	Fused Affine Quantize/Dequantize ops (#1282 ) * Add fast affine dequantize * add full quantize kernel * fused kernel with scale/bias computation * fix docstring * fix no jit error * fix test * test fix * reduce fast api to only affine_quantize	2024-07-29 15:11:38 -07:00
Awni Hannun	aa1d6cadad	Fix docs latex build and nits (#1297 ) * fix docs latex build and nits * fix stub gen and try to clean up building	2024-07-29 11:44:06 -07:00
Atakan Tekparmak	6e06e3a904	feat: Added "tanh" option to GELU approximation (#1268 )	2024-07-28 09:07:56 +02:00
Awni Hannun	7b456fd2c0	Array api (#1289 ) * some updates for numpy 2.0 and array api * some updates for numpy 2.0 and array api * fix array api doc	2024-07-26 10:40:49 -07:00
Anton Belov	5029894662	[Issue #1187 ] Add nan_to_num function initial attempt (#1247 ) * initial attempt, working with wrong types * not compiling; mx.float16 and mx.bfloat16 tests added * fix nan to num * nit --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-07-25 09:57:37 -07:00
Awni Hannun	baf9fa5f42	Einsum (#1269 ) * einsum initial * fix comma break * sum axis was wrong * small cleanups * python binding * changed bindings to resemble numpy * remove todo comment * comment changes * add count of operands/inputs * fail fast if operands list is empty * ignore comma if no output * einsum path matching numpy * getting somewhere with path * remove print * it passes the first test * moved einsum tests to seperate file * seperated einsum path * moved einsum naive * remove space from equation * fast fail if no operands passed * update tests and remove printf * small cleanup * some more cleanups * removed python helper file * ack * utilize std for finding min in vector * duplicate def * remove the tuple as it was unreadable * moved einsum_naive back to ops * remaining isn't needed * avoid creating another set * cleanup * greedy path, start of naive einsum * more einsum * fix some bugs * some more fixes, tests pass * benchmark * some simplify * fix einsum and test Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com> * add a bunch more tests and fix a bunch more bugs * some docs nits --------- Co-authored-by: dc-dc-dc <dgcruz983@gmail.com> Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-07-25 09:36:44 -07:00
Jagrit Digani	7f914365fd	Fix GPU sort for large arrays (#1285 ) * Fix GPU sort for large arrays	2024-07-24 14:37:10 -07:00
Paul Paczuski	ebd7135b50	Improve stability of BCE loss calculation for input probabilities close to or exactly 0 or 1 (#1280 ) * Improve stability of BCE loss calculation * Standardize comment * Apply formatting with black via pre-commit * Add usage recommendation to docstring * Update python/mlx/nn/losses.py --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-07-24 08:38:22 -07:00
fgranqvist	50eff6a10a	Implement sampling from laplace distribution. (#1279 )	2024-07-24 15:15:37 +02:00
Alex Barron	c34a5ae7f7	Fix bfloat16 Hadamard (#1283 ) * fix bfloat16 hadamard * add scale * review comments --------- Co-authored-by: Alex Barron <abarron22@apple.com>	2024-07-23 14:54:43 -07:00
Awni Hannun	e2aa6ec8ae	some fixes (#1281 )	2024-07-23 11:49:05 -07:00
toji	6768c6a54a	Adding missing type hints (#1243 ) * added type hints for `run`, `tree_map` and `tree_map_with_path` * fix lint --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-07-23 07:29:38 -07:00
Tim Gymnich	6307d166eb	Fix overflow / underflow handling for expm1f (#1278 ) * Fix overflow / underflow handling for expm1f * update tests	2024-07-23 07:29:06 -07:00
Awni Hannun	1fba87b0df	Fix leak with multi-output primitives (#1274 ) * fix leak with multi-output primitives * hopefully an actual fix	2024-07-23 06:34:18 -07:00
Awni Hannun	8c01a7893b	minor fix in optimizer + docs (#1264 )	2024-07-12 12:18:02 -07:00
Awni Hannun	218047c75a	docs fixes (#1263 )	2024-07-11 15:59:07 -07:00
Angelos Katharopoulos	5c1fa64fb0	Custom transforms (#1246 )	2024-07-10 18:00:01 -07:00
Alex Barron	a3c287354f	Fast Hadamard Transform (#1249 ) * Working hadamard for powers of 2 * working for m2^k add scale and check contiguity * add size check * clean up * fix test * add grads + vmap * gpu only * skip on linux * test typo * add cpu impl * remove gpu only tests * fix linux build + add is_equivalent	2024-07-09 20:39:01 -07:00
Alex Barron	bdb36c9a63	add zero vjps for bitwise ops and gather w.r.t. index (#1256 )	2024-07-07 21:34:59 -07:00
Awni Hannun	20bb301195	CPU binary reduction + Nits (#1242 ) * very minor nits * reduce binary * fix test	2024-06-28 13:50:42 -07:00
Angelos Katharopoulos	b05bcfd27f	Fixes segfault when compiling checkpointed functions (#1235 )	2024-06-26 16:14:45 -07:00
Alex Barron	2615660e62	Fix strided sort bug (#1236 ) * Use output strides in sort kernel * fix zero strides bug	2024-06-26 14:32:11 -07:00
Awni Hannun	5b0af4cdb1	fix donation condition for compilation (#1237 )	2024-06-26 09:04:05 -07:00
David Koski	4eef1e8a3e	fix typo (#1215 )	2024-06-24 13:36:35 -07:00
Alex Barron	95d11bda06	Fix NumPy 2.0 pickle test (#1221 ) * fix numpy version <2 temporarily * typo * better fix * Fix just for bfloat16 --------- Co-authored-by: Alex Barron <abarron22@apple.com>	2024-06-23 05:47:22 -07:00
Jagrit Digani	2d6cd47713	Masked gemv (#1211 )	2024-06-14 09:52:26 -07:00
Awni Hannun	df964132fb	fix scatter + test (#1202 ) * fix scatter + test * fix test warnings * fix metal validation	2024-06-11 14:35:12 -07:00
Alex Barron	27d70c7d9d	Feature complete Metal FFT (#1102 ) * feature complete metal fft * fix contiguity bug * jit fft * simplify rader/bluestein constant computation * remove kernel/utils.h dep * remove bf16.h dep * format --------- Co-authored-by: Alex Barron <abarron22@apple.com>	2024-06-06 12:57:25 -07:00
Angelos Katharopoulos	0163a8e57a	Add docs for the distributed namespace (#1184 )	2024-06-06 11:37:00 -07:00
Awni Hannun	496315fe1d	Fix scan (#1188 ) * fix scan * improve grid size * fix cpu cummax	2024-06-05 14:21:58 -07:00
Angelos Katharopoulos	0fe6895893	Fix the hard-shrink test (#1185 )	2024-06-04 16:22:56 -07:00
Nikhil Mehta	0b7d71fd2f	Add softmin, hardshrink, hardtanh (#1180 ) --------- Co-authored-by: Nikhil Mehta <nikmehta@tesla.com>	2024-06-04 15:48:18 -07:00
Awni Hannun	83b11bc58d	Fix Metal API validation for empty concat (#1183 )	2024-06-04 13:17:08 -07:00
Awni Hannun	ea9090bbc4	Add view op (#1179 ) * add view primitive * nit * fix view	2024-06-04 08:05:27 -07:00
Angelos Katharopoulos	3de8ce3f3c	In place all-reduce and forgiving init (#1178 )	2024-06-03 16:47:47 -07:00
Brian Keene	1865299a30	Metal shaders for memory efficient self attention on large sequences (#964 ) * Metal shaders for efficient self attention on large sequences Updated fast attention: GEMM-ified with Steel primitives Uses flash attention 1 for scale correction * more compiler silencing * Address rebase issues * Templatize kernel instantiation, revise cpu bindings * Safer writes to output * Permit batch size > 1 * Numerical fixes for sdpa self attention * Re-enable test, remove unused variable * add benchmarking script * Disable sdpa prior to perf tuning, and simplify tests for per-patch CI	2024-06-03 09:16:19 -07:00
Dominik Schlösser	3576b547c5	Doc error for default for scale in SinusoidalPositionalEncoding (#1174 )	2024-06-02 13:42:45 -07:00
K Venkat Ramnan	ab977109db	feat: Added dlpack device (#1165 ) * feat: Added dlpack device * feat: Added device_id to dlpack device * feat: Added device_id to dlpack device * doc: updated conversion docs * doc: updated numpy.rst dlpack information * doc: updated numpy.rst dlpack information * Update docs/src/usage/numpy.rst * Update docs/src/usage/numpy.rst --------- Co-authored-by: Venkat Ramnan Kalyanakumar <venkatramnankalyanakumar@Venkats-MacBook-Air.local> Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-05-31 12:29:01 -07:00
Awni Hannun	fd1c08137b	stable cumprod grad at 0 (#1167 )	2024-05-31 12:28:42 -07:00
Jagrit Digani	76b6cece46	Fix multi-block sort stride management (#1169 ) * Fix multi-block sort stride management * Add seed to tests	2024-05-31 11:10:54 -07:00
Jagrit Digani	9f0df51f8d	Fix matvec vector stride bug (#1168 )	2024-05-29 12:18:28 -07:00
Awni Hannun	e7a2a3dcd1	Fix a couple bugs (#1161 ) * fix jit reduce for RMS norm * make strides a single buffer * better eval error message * fix compiling with inf and bf16 * fix cpu compile with bf16	2024-05-28 15:18:18 -07:00
Awni Hannun	a87ef5bfc1	fix broadcast bug in bitwise ops (#1157 )	2024-05-24 11:44:40 -07:00
Awni Hannun	7e26fd8032	Option to JIT steel gemm / conv (#1139 )	2024-05-23 18:07:34 -07:00
Jagrit Digani	eab2685c67	Float mask update (#1152 ) * Float mask update * Update CPU impl	2024-05-23 17:20:44 -07:00
Angelos Katharopoulos	50dfb664db	Comms (#1097 ) * Start the communications branch using MPI * Add ops and primitives * Add python bindings for distributed	2024-05-23 17:04:02 -07:00
Rifur13	9401507336	Add groups to 2-D convolutions (#1129 ) * Added groups to 2-D convolutions. Only implemented for some specializations. Also fixed 1D grouped convs with different kernel strides and added more tests. * fix channels condition	2024-05-22 20:01:44 -07:00
Awni Hannun	eb8321d863	list based indexing (#1150 )	2024-05-22 15:52:05 -07:00
Abe Leininger	79ef49b2c2	add mx.trace (#1143 ) (#1147 ) * working c++ trace implementation * updated throw + added overloads * added python binding for trace function * pre-commit reformatting * add trace to docs * resolve comments * remove to_stream call	2024-05-22 15:50:27 -07:00
Awni Hannun	d568c7ee36	Rename block sparse (#1149 ) * block_sparse_mm to gather_mm * rename * nit * nit	2024-05-22 07:48:34 -07:00
Awni Hannun	e6fecbb3e1	Some fixes in docs (#1141 ) * fixes in docs * nit	2024-05-20 11:51:47 -07:00
jlwitthuhn	7e5674d8be	Treate 'minimum' differently in cosine decay (#1138 )	2024-05-20 08:00:48 -07:00
Awni Hannun	fb71a82ada	Fix copy bug with many dims (#1137 )	2024-05-17 21:10:03 -07:00
Luca Arnaboldi	b3ec792380	Implemented Cholesky on CPU (#1119 )	2024-05-17 12:31:59 -07:00
Awni Hannun	81dd33af66	allow conversion to dlpack (#1120 )	2024-05-16 16:11:37 -07:00
Angelos Katharopoulos	e78a6518fa	Block sparse qmm (#1124 )	2024-05-16 15:24:14 -07:00
Jacket	c417e42116	[Fix] minor typo in default argument for argpartition's "axis" parameter (#1125 ) According to the document, argpartition's axis parameter can be None, but due to a previous typo it can't really accepts a None value.	2024-05-15 15:25:25 -07:00
Awni Hannun	631dfbe673	fix scatter index bug (#1122 )	2024-05-14 15:04:58 -07:00
Cheng	56a4eaed72	Pass missing stream arg in array.flatten (#1111 )	2024-05-14 06:50:16 -07:00
Cheng	bf925d9dc7	Move args in conv_general (#1118 ) Also fix a typo that padding_lo is passed as padding_hi.	2024-05-14 06:50:09 -07:00
Cheng	1a7ed5dcb6	Fill vector with constructor instead of fill_n (#1113 )	2024-05-14 06:28:55 -07:00
Cheng	5be5daa6ef	Use compiled function in Sigmoid module (#1116 )	2024-05-14 06:25:57 -07:00
Cheng	60cb11764e	Use correct module type in quantized.py (#1115 )	2024-05-14 06:25:42 -07:00
Cheng	cbd5445ea7	The tile op does not accept None as reps (#1117 )	2024-05-14 06:25:25 -07:00
Max-Heinrich Laves	ff4223904d	Conv3d (#993 ) * added conv3d added conv3d implemented explicit_gemm_conv_ND_cpu and bounds checks for slow_conv_3D * incorporated reviewer comments * fixed test * reduced tensor shapes in test for conv3d * Reviewer suggestion Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Reviewer suggestion Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Reviewer suggestion Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Reviewer suggestion	2024-05-11 06:15:02 -07:00
Alex Barron	2e158cf6d0	Add conjugate operator (#1100 ) * cpu and gpu impl * add mx.conj and array.conj() --------- Co-authored-by: Alex Barron <abarron22@apple.com>	2024-05-10 07:22:20 -07:00
Awni Hannun	b21242faf1	Allow unary ops to accept array like (#1093 )	2024-05-09 09:36:02 -07:00
Rahul Yedida	cc05a281c4	Added ArcTan2 operation (#1079 ) * Added ArcTan2 operation * Cleanup, bug fixes from code review * Minor cleanup, fixed Linux tests	2024-05-08 08:35:15 -07:00
Awni Hannun	9814a2ae12	fix conversion to array (#1070 )	2024-05-06 16:02:49 -07:00
Shubham	6992498e7a	add keyword positonal (#1081 )	2024-05-06 07:18:49 -07:00
Awni Hannun	21623156a3	Reset peak memory (#1074 ) * reset peak memory * fix linux * nits in docs	2024-05-03 17:12:51 -07:00
Nripesh Niketan	79c859e2e0	feat: implement `clip_grad_norm` (#1043 ) * feat: implement `clip_grad_norm` * pre-commit * Add test for clip_grad_norm function in test_optimizers.py * small fixes * fix * lint * Update tree_reduce * Update python/mlx/utils.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/mlx/utils.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/mlx/utils.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/mlx/utils.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/mlx/utils.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/mlx/utils.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Refactor clip_grad_norm function to include documentation and improve readability * format docstring * Add acknowlegements * text wrap * pre-commit * nits in docs --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-05-03 09:07:02 -07:00
Awni Hannun	b00ac960b4	change initial memory limits and add memory size to device info (#1064 )	2024-05-03 06:50:15 -07:00
Jagrit Digani	f390957685	Block sparse mm (#1058 )	2024-05-02 14:03:58 -07:00
Angelos Katharopoulos	17f57df797	Improvements in the quantizer and dequantization kernel (#1061 )	2024-05-01 18:19:11 -07:00
Awni Hannun	7f7b9662ea	Fix leak for multi-output primitives which are never detached (#1059 ) * fix multi output leak * ignore arrays that will be detached * add some comments * stray print	2024-05-01 07:31:45 -07:00
Awni Hannun	19bef39f5c	Add a `mx.metal.device_info` (#1060 ) * device inof * add variant * fix linux * fix doc	2024-04-30 15:47:27 -07:00
Angelos Katharopoulos	8db7161c94	Bug fix in quantize (#1054 )	2024-04-29 20:55:04 -07:00
Awni Hannun	09f1777896	fix slice update indexing (#1053 )	2024-04-29 12:17:40 -07:00
Jacket	490c0c4fdc	[Fix] expand axes for dimension with integer indices in mlx_slice_update (#1035 ) * Not sure if this is correct * Format * Edit tests * Add negative test * Format * add one more test --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-04-29 07:57:28 -07:00
Rifur13	c4a471c99d	Add groups to Conv1d (#948 ) * Add conv1d grouped convs on CPU * Add GPU support * Parallelize inside metal kernel * clenaup * Update mlx/ops.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * New unfold kernel + remove unused code * Remove copy and refactor * Update vjp and reuse steel gemm * Fixed groups on cpu * Fix metal validation --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-04-27 06:24:57 -07:00
Awni Hannun	86f495985b	Add bitwise ops (#1037 ) * bitwise ops * fix tests	2024-04-26 22:03:42 -07:00
Awni Hannun	5bfe89bdb1	Cpp docs (#1036 ) * start of C++ docs * fix stream doc * only include ops for now	2024-04-26 12:56:05 -07:00
Awni Hannun	771575d27b	Expose function to clear memory cache (#1032 ) * expose function to clear memory cache * fix linux build * fix metal tests	2024-04-24 16:48:51 -07:00
Angelos Katharopoulos	ec8578d41a	Fix quantization of all 0s (#1028 )	2024-04-24 00:40:42 -07:00
Aneesh Shetty	d0dbfe0b97	Adds radians and degrees (#1011 )	2024-04-22 11:17:49 -07:00
Awni Hannun	3d405fb3b1	Add synchronize function (#1006 ) * add synchronize function * fix linux * fix linux * fix and fix docs * fix test * try synchronize in stream destroy * synchronize works for both cpu and gpu	2024-04-22 08:25:46 -07:00
Angelos Katharopoulos	84d61d27aa	Make sure 0 is represented in the quantization (#1016 )	2024-04-19 19:47:26 -07:00
Angelos Katharopoulos	ef5f7d1aea	Fix buffer protocol buffer size designation (#1010 )	2024-04-19 06:06:13 -07:00
Jagrit Digani	85c8a91a27	Fix mask broadcasting bug and add relevant test (#1003 )	2024-04-17 17:33:48 -07:00
Piotr Rybiec	581b699ac9	avgpool, not maxpool (#1002 )	2024-04-17 08:26:22 -07:00
Awni Hannun	8a0677d56d	Shared events for synchronization + async eval (#998 ) * more async eval * fix rebase * try correct async eval * fix async * more tests for async eval * use shared events for synchronization * comment + cleanup * with autorelease pool * fix no metal build * fix compile * fix patch * don't eval if asyn evale'd * don't use is_evaled * comments * more multi stream tests * try and cleanup use of is_evaled * use a status flag	2024-04-17 06:16:02 -07:00
Jagrit Digani	b18468bf81	Masked mm (#978 ) * Add block masked matmul op and primitive	2024-04-16 14:45:39 -07:00
Shiyu	107ba2891a	gelu tanh approx (#989 ) * gelu tanh approx * gelu tanh approx * replace gelu approx with tanh approach * fix comments * fix comment	2024-04-15 19:49:00 -07:00
Awni Hannun	cd9e184529	Quantize embedding (#994 ) * quantize embedding * rename as_linear + comment * consistency in docs * fix test	2024-04-15 16:42:10 -07:00
Alex Barron	2e7c02d5cd	Metal FFT for powers of 2 up to 2048 (#915 ) * add Metal FFT for powers of 2 * skip GPU test on linux * fix contiguity bug * address comments * Update mlx/backend/metal/fft.cpp * Update mlx/backend/metal/fft.cpp * fix bug in synch --------- Co-authored-by: Alex Barron <abarron22@apple.com> Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-04-11 21:40:06 -07:00
Awni Hannun	ae18326533	No copy command encoder (#986 ) * no copy command encoder * up layer norm test tolerances	2024-04-11 21:15:36 -07:00
Awni Hannun	12d4507ee3	Explicit barriers with concurrent dispatch (#977 )	2024-04-10 21:45:31 -07:00
Shiyu	061cf9a4ce	Upsample with bicubic interpolation (#967 )	2024-04-10 15:47:22 -07:00
Awni Hannun	99abb9eff4	Async eval (#972 )	2024-04-09 18:34:00 -07:00
Luca Arnaboldi	fffe072028	Implementation of mlx.random.multivariate_normal (#502 ) (#877 ) * Implementation of mlx.random.multivariate_normal (#502) * Update python/src/random.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/random.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/random.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Updated typo in docstring * Restricted multivariate_normal to float32 * Generic mean and variance shapes * Review edits * Update mlx/random.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/random.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/random.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/random.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Test for ndim of mean and cov * nits * smaller size for test * fix broadcasted sampling --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-04-09 13:50:12 -07:00
Abe Leininger	a1a31eed27	Add mx.meshgrid (#961 )	2024-04-09 11:43:08 -07:00
Awni Hannun	42afe27e12	std and expm1 (#973 ) * std and expm1 * actually add expm1 * fix linux * fix vjp * relax tol for linux test * Add it to the compilable primitives --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-04-08 14:26:01 -07:00
Awni Hannun	76e63212ff	Enable bfloat scan (#974 ) * enable bfloat scan * fix tests	2024-04-08 12:29:19 -07:00
Awni Hannun	aac2f9fb61	Improve profiling with gpu tracing (#969 ) * improve profiling with gpu tracing * fix for linux * nit * doc fix * fix example	2024-04-07 21:47:43 -07:00
Awni Hannun	039da779d1	No quant reshape (#957 ) * precise option on cpu * remove print * remove reshape in quant matmul * no quant reshape	2024-04-04 11:52:12 -07:00
Awni Hannun	d88d2124b5	segfaut layer norm grad (#955 )	2024-04-04 10:59:15 -07:00
Awni Hannun	e142aaf8a1	Option for precise softmax (#953 ) * precise softmax * Add an equivalency check * Make the threadgroup memory definition fixed * precise cpu softmax * precise option on cpu * remove print --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-04-04 08:32:35 -07:00
AmirHossein_Razlighi	0caf35f4b8	Better exceptions in case of invalid operations on `mlx.core.array` (#910 ) (#926 ) * Nicer exceptions for ops on non-arrays	2024-04-02 21:11:24 -07:00
Angelos Katharopoulos	3fc993f82d	Properly handle negative axes in python vmap (#944 )	2024-04-02 18:07:23 -07:00
Awni Hannun	741eb28443	fix a couple bugs (#952 )	2024-04-02 12:07:41 -07:00
Angelos Katharopoulos	1a87dc5ea8	Fix compile fusion for multi-output edge cases (#950 ) * Fix compile fusion for multi-output edge cases * Add a test for multi-output compile	2024-04-02 08:42:31 -07:00
Awni Hannun	2427fa171e	Fix cpu compile (#934 ) * fix one cpu bug, test for another * format hooks * simplify contiguity check for cpu compile * fix * add back donation * comment	2024-04-01 17:37:12 -07:00
Jagrit Digani	639e06e1f3	Indexing bug fix (#947 ) * Fix axes accounting * Add tests	2024-04-01 12:18:50 -07:00
Angelos Katharopoulos	02fedbf1da	Fix array initialization from list (#942 ) * Fix array initialization from list * Change the error message in the test	2024-04-01 06:27:52 -07:00
Angelos Katharopoulos	110d9b149d	Layer norm grad fix donation bug (#941 ) * add layer norm grad test * Fix donation bug in layernorm vjp --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-04-01 06:15:50 -07:00
AmirHossein_Razlighi	f48bc496c7	Comparing python objects (such as list/tuple) with `mlx.core.array` (#920 ) * add implicit conversion of list to array for equality constraint * add tests for array equality * add test for tuple and array equality * return False if __eq__ arg is list or tuple * write tests for equality * update the rule of comparison for __ge__/__gt__/__lt__/__le__ * add a helper function for detecting mlx.core.array * return true in case fo inequality * debug minor issue regarding detecting mlx array * add tests for inequality comparisons * add name for contribution * reformat files using pre-commit * update tests for float * update tests for inequality * raise exception in case of invalid comparisons * use isinstance instead of string comparison * replace "is_convirtable_to_array" with previous logic * remove throwing exceptions for other operations * just a comment * minor changes for efficiency * optimize a utils function * change the function name * Update ACKNOWLEDGMENTS.md --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-03-29 06:52:30 -07:00
Angelos Katharopoulos	5f9ba3019f	Fix qmm_t for unaligned cases (#923 )	2024-03-28 15:34:57 -07:00
Cheng	46caf0bef0	Remove unnecessary string copies (#891 ) 1. Use string_view instead of string when there is no need for copy. 2. Otherwise move string when possible.	2024-03-28 13:14:59 -07:00
Cheng	a7b404ff53	Use uintptr_t instead of size_t to store funtion id (#916 ) Also does some small cleanup of the compile cache code.	2024-03-28 06:37:59 -07:00
AmirHossein_Razlighi	d611251502	Support Chaining for some of functionalities of `nn.Module` (#885 ) (#897 ) * add chaining support for some of the functionalities of "nn.Module" * reformat * change the return types * remove return types * add return type with forward referencing * add tests for chaining * add name to contributors * Update python/mlx/nn/layers/base.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/mlx/nn/layers/base.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * update docstring * update docstrings --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-03-27 19:58:29 -07:00
Cheng	f30b659291	Make MLX build on x64 macOS (#901 ) The arm64 macbook pros are heavy and I usually care my intel one for mobile, it would be nice if I can play with MLX on it. To build with x64, user must pass `MLX_ENABLE_X64_MAC` to cmake: CMAKE_ARGS='-DMLX_ENABLE_X64_MAC=ON' python setup.py	2024-03-27 06:14:29 -07:00
Angelos Katharopoulos	29221fa238	Implement vjps for some primitives in the fast namespace (#883 ) * Implement rope vjp in terms of rope * RMSNormVJP primitive and kernel * Add LayerNormVJP primitive and kernel	2024-03-26 16:35:34 -07:00
Jagrit Digani	925014b661	Fix multiblock sort limits (#906 ) * Fix multiblock sort limits * Fix metal validation error	2024-03-26 14:00:00 -07:00
Abdussamet Türker	5611e1a95e	Fix unsqueeze with None (#899 ) * Fix unsqueeze with None * Clean unnecessary files	2024-03-26 13:59:44 -07:00
Awni Hannun	570f2bf29e	pick up preivously set attributes (#905 )	2024-03-26 11:19:59 -07:00
Luca Arnaboldi	a3ee03da01	Fixing random.normal for half-precision dtype #642 (#904 ) * Fixing random.normal for half-precision dtype #642 * Update python/tests/test_random.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-03-26 09:58:27 -07:00
Jack Mousseau	8e686764ac	Ensure shape dimensions are within supported integer range (#566 ) (#704 ) * Ensure shape dimensions are within supported integer range (#566) * fix build * fix rebase bug --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-25 13:29:45 -07:00
Daniel Strobusch	479051ce1c	add numeric type hierarchy and issubdtype as well as a set_dtype meth… (#427 ) * add numeric type hierarchy and issubdtype as well as a set_dtype method to nn.Module with predicate numeric type hierarchy and issubtype is compatible to the [numpy hierarchy](`220f0ab2c5/numpy/_core/numerictypes.py (L42)`). Closes #285. * nits in docs * unify type category checking * nits in docs * nits in docs * more docs nits * fix callable type --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-25 12:32:59 -07:00
Awni Hannun	1e16331d9c	post nanobind docs fixes and some updates (#889 ) * post nanobind docs fixes and some updates * one more doc nit * fix for stubs and latex	2024-03-24 15:03:27 -07:00
Awni Hannun	be98f4ab6b	Reduce a little overhead (#871 ) * some small overhead improvements * use result_type in rms_norm * remove release force * fix + use non-vector version * revert compile change * fix ops * a little more overhead * a little more cleanup and overhead	2024-03-22 17:29:36 -07:00
Jagrit Digani	8e5a5a1ccd	Set item bug fix (#879 ) * set item shaping bug fix * Add extra tests	2024-03-22 12:11:17 -07:00
Angelos Katharopoulos	fcda3a0e66	Increase test tolerance for fast.layer_norm (#880 )	2024-03-22 12:10:27 -07:00
Cheng	9663c22fe9	Do not store iostream in shared_ptr (#872 ) There is no need to store iostream in shared_ptr, doing so adds the cost of a heap allocation.	2024-03-22 06:54:45 -07:00
Awni Hannun	44390bd3d0	Bump (#869 ) * bump * fix none in a few ops	2024-03-21 13:56:56 -07:00
Angelos Katharopoulos	2225374060	Adds mx.fast.layer_norm (#870 )	2024-03-21 13:55:51 -07:00
nicolov	105d236889	Add vmap for SVD and inverse (#849 )	2024-03-21 13:18:27 -07:00
Angelos Katharopoulos	53e6a9367c	Use reshape and transpose for non-overlapping pooling windows (#867 )	2024-03-21 10:21:03 -07:00
Chime Ogbuji	f5a1582fe8	Add minimum for cosine decay function (#859 ) * Add minimum for cosine decay function * Update python/mlx/optimizers/schedulers.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-03-21 07:33:29 -07:00
Awni Hannun	a54f06b16f	Fast RMS Norm (#862 ) * fast rmsnorm * no rms gpu * kernel * fix shared mem * looped rms and donation in softmax * Make the squaring in float32 to avoid underflow * Fix the default StreamOrDevice for rope and rms_norm in fast * nits --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-03-21 07:20:54 -07:00
Jagrit Digani	a5681ebc52	Update set item (#861 ) * Update mlx_set_item to handle regular slices without expanding * Refactor ellipsis handling * Route mlx_set_item to slice_update where possible * Update mlx_scatter_args_slice * Don't route to gather if no array indices	2024-03-21 02:48:13 -07:00
Jagrit Digani	b219d12a6b	Check edge case handling in row reduce med kernel (#858 )	2024-03-20 11:37:58 -07:00
Md. Rasel Mandol	db6796ac61	simple typo `fille` (#848 )	2024-03-19 06:15:17 -07:00
Awni Hannun	9a8ee00246	Switch to nanobind (#839 ) * mostly builds * most tests pass * fix circle build * add back buffer protocol * includes * fix for py38 * limit to cpu device * include * fix stubs * move signatures for docs * stubgen + docs fix * doc for compiled function, comments	2024-03-18 20:12:25 -07:00
Awni Hannun	16546c70d8	No reshape rope (#838 ) * no reshape rope * no reshape rope	2024-03-18 17:03:07 -07:00
nicolov	eaba55c9bf	Add matrix inversion primitive (#822 )	2024-03-15 06:34:36 -07:00
Awni Hannun	19ec023256	vmap matmul and admm (#836 )	2024-03-14 14:38:22 -07:00
Angelos Katharopoulos	76c919b4ec	NumberOfElements for shapeless compile and vmap fixes (#802 )	2024-03-13 10:34:14 -07:00
Jagrit Digani	5ad133f8bb	No copy gems (#801 ) * Enable collapsing batch dims in gemm * Update gemm to only make copies when neither of the last 2 axes are contiguous * Update addmm to support gemv shapes * Update addmm to support irregular batch strides * Update tests	2024-03-12 13:13:41 -07:00
nicolov	d0c544a868	Add SVD primitive (#809 ) Add SVD op using Accelerate's LAPACK following https://developer.apple.com/documentation/accelerate/ compressing_an_image_using_linear_algebra Co-authored-by: Nicolo Valigi <nvaligi@apple.com>	2024-03-12 12:30:11 -07:00
Daniel Falbel	ffb19df3c0	Fix docstring for correctly rendering (#820 )	2024-03-12 11:46:44 -07:00
Awni Hannun	366478c560	fix modules with dict (#819 )	2024-03-12 08:54:06 -07:00
Justin Deschenaux	8e5600022a	Implement RNN, GRU, LSTM (#268 ) * RNN base implementation * Address comments+format * nits in docs * add tests for prb * fix test * add a couple tests --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-11 21:14:44 -07:00
Awni Hannun	0e95b64942	Fix bug in tape order during simplify (#816 ) * fix bug in tape order during simplify * properly fix compile * last bug	2024-03-11 17:29:05 -07:00
Awni Hannun	7c441600fe	Compile stride bug (#812 ) * fix compile stride bug * revert sdpa fix * fix cpu * fix bug with simplifying outputs	2024-03-11 06:31:31 -07:00
Awni Hannun	28301807c2	Version bump and os error (#807 )	2024-03-07 13:57:58 -08:00
Awni Hannun	b7588fd5d7	fix inplace to not make a shallow copy (#804 )	2024-03-07 09:34:11 -08:00
Luca Arnaboldi	cbefd9129e	Implementation of pickle, copy and deepcopy for Python arrays (#300 & #367 ). (#713 ) * Implemented pickling and copy for Python arrays(#300 & #367) * Fixing typos * Pickle with NumPy arrays * Pickle: workaround for bfloat16 * Revert "Pickle: workaround for bfloat16" This reverts commit `25afe6bc09`. * Added an error when pickling bfloat16 * Update python/tests/test_array.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/tests/test_array.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/array.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/array.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * clang-format applied --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-03-06 08:02:41 -08:00
Awni Hannun	cbcf44a4ca	Some fixes in cache / thread safety (#777 ) * some fixes in cache / thread safety * speed up no cache case * fix opt test * optimizer docs * otpimizer docs * fix adafactor * fix adafactor	2024-03-05 13:30:50 -08:00
Awni Hannun	859ae15a54	Fix test (#785 )	2024-03-04 23:02:27 -08:00
Brian Keene	0787724c44	Fast Inference SDPA op (#735 ) * Fast Inference SDPA op Implements metal shaders for: o = mx.fast_inference_sdpa(queries, keys, values, scale, mask) Supports fp16, fp32 dtypes; assumes d_k = 128. Generic op support / prompt encoding supported via mlx primitives. Metal implementation is for the inference use case only. Majority of performance benefits appears to results from GQA & reduced bandwidth requirements; there is approximate performance parity for the MHA use case (from some measurements on M3 Max). * Flush shared memory to zero before unprotected reads for (scores @ values) * Move to fast:: namespace, address reviewer comments ... also attempt to revert formatter auto-change for files not relevant to this change * Shared memory flush to top of kernel * Resolve compiler warnings * Update python/src/fast.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/fast.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/fast.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update python/src/fast.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update docstring per PR feedback * Softmax in higher precision, ... * route to fallback for more use cases - batch size > 1, head_dim other than 128, etc. * Address linux build failure * Address other reviewer comments * Remove extraneous eval_cpu function per review --------- Co-authored-by: Atila Orhon <64497909+atiorh@users.noreply.github.com> Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Co-authored-by: atila <atiorh@icloud.com>	2024-03-04 21:06:11 -08:00
Awni Hannun	5121f028d9	nice tensordot for mlx c (#782 )	2024-03-04 09:51:02 -08:00
Piotr Rybiec	6a665ea6ed	Dilation for convolutional layers (#766 ) * add dilation parameter to Conv1d layer * space here too * add conv1d dilation test * add dilation parameter for Conv2d layer * conv2d dilation test	2024-03-04 06:43:00 -08:00
Awni Hannun	bc06cb9ff6	Pickle + dtype fix for numpy conversion (#763 ) * pickle + dtype fix for numpy conversion * fix getattribute on Module base * remove unused function * fix tests * add topk to ops * fix doc	2024-03-02 06:09:29 -08:00
Angelos Katharopoulos	8e281c76c3	Fix the top-k op (#768 )	2024-03-01 22:08:43 -08:00
Awni Hannun	d5964a2710	bindings for memory info (#761 ) * bindings for memory info * update api * keep cache low if requested * fix default * nit in ops error	2024-03-01 19:51:58 -08:00
Ikko Eltociear Ashimine	cf3eb87e52	Fix typo in transforms.cpp (#764 ) occuring -> occurring	2024-02-29 22:23:46 -08:00
Awni Hannun	4494970f47	avoid nested closures in module (#759 )	2024-02-29 09:39:52 -08:00
Jagrit Digani	776c3d226d	Convolution update (#651 ) * Init steel conv and update Conv primitive * Update slow CPU implementation to support flipping and input dilation winograd conv routing Co-authored-by: Awni Hannun <awni@apple.com>	2024-02-28 20:11:16 -08:00
Awni Hannun	420ff2f331	Add back compiled function signatures and docstrings (#749 ) * try to add back compiled function signatures and docstrings * add indentation to docstring	2024-02-27 13:18:59 -08:00
Noah Kasmanoff	de3d2467a3	Update: Fast GeLU Approximation (#744 ) * add: fast gelu approx * fix docs * Update gelu_fast_approx function documentation * Update python/mlx/nn/layers/activations.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * fix: test gelu --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-02-26 21:08:50 -08:00
Awni Hannun	fe1dabf272	Fix compile with non standard types (#745 ) * refactor tree utils * fix compile + tree code refactor * Add an extra test * add a few missing activations to docs * hash structure * Encode the full argument structure --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-02-26 19:28:53 -08:00
Hinrik Snær Guðmundsson	08226ab491	added atleast args input support (#710 ) added atleast list(array) input support * function overloading implemented * Refactoring * fixed formatting * removed pos_only	2024-02-26 11:17:59 -08:00
Chime Ogbuji	3b661b7394	Add linear warmup and schedule joining for use with existing schedules (#721 ) * Add linear warmup to schedules for use with existing schedules * Changed parameters for simplicity of most common case (0 initial value) * Added ScheduleJoiner and updated documentation * ScheduleJoiner -> join_schedules (ala optax #) * black compliance * Different evaluation of schedules * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-02-26 07:28:48 -08:00
Awni Hannun	e6418781ab	Fix logsumexp edge case (#740 ) * fix logsumexp * fix inf constant * also fix power grad * fix ternary dispatch	2024-02-25 08:39:55 -08:00
Gabrijel Boduljak	22364c40b7	Upsample2d (#414 ) Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com> Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-02-23 09:55:04 -08:00
Noah Farr	d729a1991b	Fix arange with inf step (#686 ) * Fix case for step=inf in arange and add inf check for start/stop * Add test cases for arange * Update ops.cpp to include climits header * Fix arange * Fix formatting * Refactor * Add missing include	2024-02-23 06:18:15 -08:00
Awni Hannun	5798256fcf	Shapeless compilation for some graphs (#687 ) * shapeless compilation for some graphs * update compile benchmark * default compile a few activations * buffer donation * bugfix * shapeless fix * update tests to work for cpu and gpu fusion * test kwargs * add kwargs to compile * Recompile when python arguments change * no compile for tanh * some constant tests --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-02-19 21:43:54 -08:00
Awni Hannun	d0fda82595	fix tolist for half types (#702 )	2024-02-19 09:44:27 -08:00
Hinrik Snær Guðmundsson	f883fcede0	Added support for atleast_1d, atleast_2d, atleast_3d (#694 )	2024-02-19 09:40:52 -08:00
Srimukh Sripada	818cda16bc	Support LR schedulers (#334 ) * Add a few LR schedulers * Move parents's constructor call to the top * Fix docstring * refactor optimizers into two files * add docs * nit * Fix Callable type annotation for python 3.8 --------- Co-authored-by: Awni Hannun <awni@apple.com> Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-02-15 11:26:20 -08:00
toji	85143fecdd	improved error msg for invalid axis(`mx.split`) (#685 ) * improved error msg for invalid axis(`mx.split`) * Apply suggestions from code review Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * fixed formatting issue --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-02-15 07:25:38 -08:00
Diogo	35431a4ac8	Adds device context manager (#679 )	2024-02-14 14:14:58 -08:00
Awni Hannun	ccf1645995	Custom primitive + RoPE fat op (#676 ) * extensions start * rope custom op * fix build * docs + rope benchmark * fix test * Add a Metal kernel for RoPE * Fix position of traditional * transform tests * Move rope computation to float and fix tests * Fix the test and a typo * change to fast * fix no metal build --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-02-14 14:04:25 -08:00
Noah Farr	0c65517e91	Return empty array when repeats is 0 in mx.repeat (#681 ) * Return empty array when repeats is 0 * Add test case for repeats = 0	2024-02-13 17:49:31 -08:00
Gabrijel Boduljak	e54cbb7ba6	Pooling layers (#357 ) Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-02-12 22:08:13 -08:00
Angelos Katharopoulos	40c108766b	Quantized matmul fix (#677 ) * Fix qmv for small or unaligned matrices * Fix qmm	2024-02-12 18:54:21 -08:00
Nripesh Niketan	0dbc4c7547	feat: Update pre-commit-config.yaml (#667 )	2024-02-11 06:08:20 -08:00
Awni Hannun	b96be943dc	bug fix (#658 )	2024-02-09 16:50:45 -08:00
Abdussamet Türker	b670485185	Remainder negative numerator bug fixed (#641 ) Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-02-09 16:49:14 -08:00
Diogo	b57bd0488d	Metadata support for safetensors (#639 ) * metadata support for safetensors * aliases making it alittle more readable * addressing comments * python binding tests	2024-02-08 19:33:15 -08:00
Awni Hannun	5c03efaf29	Compile docs (#653 ) * compile docs * docs nits + comments	2024-02-08 11:21:50 -08:00
LeonEricsson	7dccd42133	updated calls to use loc &scale (#643 )	2024-02-08 09:01:59 -08:00
Awni Hannun	1b97b2958b	Compile with capture (#629 ) * Simple kernel generation * Remove the generate kernel from graph_utils * fix multi-output with compile * fuse with stopgrad * v1 input, output capture in compile * cleanup tree update with visitor update * nit * remove todo * state for model, optional explicit init and more pure optimizer steps * move learning rate to state * add lr to opt state, some fixes in capture * fix optim * update tuple of containers as well * fix stream for compiled output * rng state for compile * nit * updates and comments --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-02-07 17:29:22 -08:00
Awni Hannun	e5e816a5ef	fix sequential with empty modules at end (#647 )	2024-02-07 13:22:27 -08:00
Noah Farr	5fd11c347d	Add loc and scale to random.normal (#638 ) * Add loc and scale to random.normal * Add tests for loc and scale for random.normal * Run pre-commit hooks * Fix code review	2024-02-07 11:49:59 -08:00
Aryan Gupta	ef73393a19	Feat: Add weights argument in BCE Loss and tests (#620 )	2024-02-07 09:39:52 -08:00
Angelos Katharopoulos	ea406d5e33	CI change (#645 ) * CI update * Skip large binary test for now * Upgrade pip * Add proper env variable skipping * Update the CI * Fix workflow name * Set the low memory flag for the tests * Change build process * Add pip upgrade * Use a venv * Add a missing env activate * Add setuptools * Add twine upload back * Re-enable automatic release builds	2024-02-07 06:04:34 -08:00
Awni Hannun	d40a04f8dc	minor fixes (#631 ) * minor fixes * var with ddof >= nelements	2024-02-05 13:27:49 -08:00
Awni Hannun	d75ae52ecd	Compile primitive (#571 ) * Compiled primitive with basic binary, unary graph-level fusion	2024-02-05 06:51:22 -08:00
Awni Hannun	5c3ac52dd7	fix test (#627 )	2024-02-04 16:18:03 -08:00
Avikant Srivastava	11a9fd40f0	fix: handle linspace function when num is 1 (#602 ) * fix: handle linspace function when num is 1 * add comment * fix test case * remove breakpoint	2024-02-04 11:03:49 -08:00
Daniel Strobusch	4fd2fb84a6	make python array SupportsAbs conform (like numpy) (#624 )	2024-02-04 09:31:02 -08:00
Daniel Strobusch	9852af1a19	fix "shape" docstring. (#623 )	2024-02-04 09:21:22 -08:00
AtomicVar	83f63f2184	Add Margin Ranking Loss (#536 )	2024-02-02 10:57:31 -08:00
Awni Hannun	cb6156d35d	Fix eval in trace bugs (#612 ) * Fix eval in trace bugs * comment nit	2024-02-02 09:57:12 -08:00
Awni Hannun	e88e474fd1	Reduce vmap + some fixes (#601 )	2024-02-01 11:30:28 -08:00
David Koski	601c6d6aa8	Fix for AdaDelta (#603 ) - state was being read from parameter "s" - but being stored in parameter "u"	2024-02-01 09:56:27 -08:00
Angelos Katharopoulos	ba8d6bf365	Change the transformer to norm_first by default (#599 )	2024-01-31 12:55:30 -08:00
Sugato Ray	4a5f3b21bb	Add `py.typed` to support PEP-561 (type-hinting) for `mlx` (#588 ) * Add `py.typed` to support PEP-561 (type-hinting) This adds support for type-hinting information as laid in [PEP-561](https://peps.python.org/pep-0561/). * add py.typed to MANIFEST.in	2024-01-31 12:05:42 -08:00
Vijay Krish	fcc5ac1c64	Add GPU support for uint64/int64 reductions (#569 )	2024-01-31 11:18:04 -08:00
nathan	bad67fec37	Added TeX line breaks to mlx.optimizers.Lion docstring (#595 ) Fixes the "misplaced &" MathJax error in documentation.	2024-01-30 19:37:34 -08:00
Angelos Katharopoulos	0de5988f92	Custom VJP and checkpointing (#541 ) * Implement custom_vjp and checkpointing * Add a dependency management primitive * Change the eval order to deep branches first * Add graph depth tracking to the array	2024-01-30 16:04:45 -08:00
Jacket	143e2690d5	Fix SGD implementation (#473 )	2024-01-30 15:50:46 -08:00
Awni Hannun	09b9275027	Make shape a tuple (#591 ) * shape tuple * also remove simplify from docs * rebase	2024-01-30 13:11:01 -08:00
Andre Slavescu	d3a9005454	Softshrink mapping + op (#552 ) * Added Softshrink mapping + op * formatting * docs + nits in docstring --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-30 12:56:28 -08:00
Jacket	3f7aba8498	Implement diagonal operator (#562 ) * Implement diagonal operator This implements mx.diagonal in operator level, inspired by @ManishAradwad. * added `mx.diag` with tests * corrected few things * nits in bindings * updates to diag --------- Co-authored-by: ManishAradwad <manisharadwad@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-30 09:45:48 -08:00
Awni Hannun	3c2f192345	Propagate nans in binary ops (#579 ) * propagate nans in binary ops * handle empty matmul * cpu minimum/maximum propagate nan * benchmark maximum * add min as well * throw on negative indices with full * verbose on linux * fix matmul for zero K	2024-01-29 11:19:38 -08:00
Angelos Katharopoulos	37d98ba6ff	No gil eval (#565 )	2024-01-26 22:03:52 -08:00
Awni Hannun	8993382aaa	Buffer Donation (#519 ) * buffer donation * fix to move shared pointer * format * gpu in place for copy and binary * revert ops test * cpu in place * a little cleanup * remove useless bench	2024-01-26 16:30:33 -08:00
Awni Hannun	07f35c9d8a	Fix a few issues: docs for flatten, erf, dequantize validation (#560 ) * doc flatten * erf doc * check values for dequantize * format	2024-01-26 15:16:46 -08:00
Jagrit Digani	bf17ab5002	Add more checks and clearer error messages to conv operations (#563 ) * Add more checks and clearer error messages to conv operations	2024-01-26 15:13:26 -08:00
Awni Hannun	8fa6b322b9	Compile front-end (#476 ) * fix tests for linux * make a move on compile * basic compile scaffold works * compile binding * clean * fix * fix grad, more tests * basic python tests * fix segfault on python exit * compile works with python closures * fix test * fix python globals bug, and erase * simplify * more cpp tests * bug fix with move function and compile at exit * simplify inputs also * enable and disable compiler * remove simplify * simplify tests use compile now * fix multi-output with compile * clear output tree from cache when function goes out of scope * ../python/src/transforms.cpp * remove closure capture * comments	2024-01-26 13:45:30 -08:00
David Koski	874b739f3c	Fix cache key in RoPE (#561 )	2024-01-26 13:10:02 -08:00

... 3 4 5 6 7 ...

612 Commits