zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-06-25 01:41:17 +08:00

Author	SHA1	Message	Date
Awni Hannun	e1c9600da3	Add `mx.random.permutation` (#1471 ) * random permutation * comment	2024-10-08 19:42:19 -07:00
Awni Hannun	1fa0d20a30	consistently handle all -inf in softmax (#1470 )	2024-10-08 09:54:02 -07:00
Awni Hannun	3274c6a087	Fix array is_available race cases (#1468 )	2024-10-07 19:13:50 -07:00
Angelos Katharopoulos	9b12093739	Add the roll op (#1455 )	2024-10-07 17:21:42 -07:00
Awni Hannun	f374b6ca4d	Bump nanobind to 2.2 (#1461 ) * bump nanobind * extension version for tests	2024-10-07 16:52:40 -07:00
Awni Hannun	0070e1db40	Fix deep recursion with siblings (#1462 ) * fix recursion with siblings * fix * add test * increase tol	2024-10-07 06:15:33 -07:00
Awni Hannun	e4534dac17	Conv grad with groups + bugfix (#1449 ) * fix bug in flipped conv with groups, start of grad for groups * fix * fix * fix + test	2024-10-06 07:08:53 -07:00
Awni Hannun	1bdc038bf9	fix argpartition + faster {arg} sorts / partitions (#1453 )	2024-10-03 14:21:25 -07:00
Lucas Newman	4a64d4bff1	Add support for grouped 1D convolutions to the nn API (#1444 ) * Fix the weight shape for grouped convolutions from the nn API. * Add tests. * Pre-commit formatting. * Add input validation. * Use integer division instead of casting. * docs * nit --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-09-28 06:41:07 -07:00
Awni Hannun	718aea3f1d	allow take to work with integer index (#1440 )	2024-09-26 15:58:03 -07:00
Awni Hannun	195b429d99	Put along axis + fixe for partition grad (#1430 ) * put along axis, fixes for partition grad * zeros for arg reduce	2024-09-23 10:03:38 -07:00
Nripesh Niketan	6af5ca35b2	feat: add cross_product (#1252 ) * feat: add cross_product * lint * python binding * refactor: Improve error message for cross_product function * refactor: more close to numpy cross product * refactor: improve error message for cross_product function * finish * fix acks * allow old numpy * doc --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-09-17 13:12:43 -07:00
Angelos Katharopoulos	914409fef9	Data parallel helper (#1407 )	2024-09-16 18:17:21 -07:00
Awni Hannun	d6492b0163	fix clip (#1415 )	2024-09-14 16:09:09 -07:00
Awni Hannun	8b30acd7eb	fix module attribute set, reset, set (#1403 )	2024-09-11 16:30:42 -07:00
Awni Hannun	3ae6aabe9f	throw for certain cases of non captured inputs in compile (#1401 )	2024-09-09 14:54:31 -07:00
Max-Heinrich Laves	efeb9c0f02	Transposed Convolution (#1245 ) * initial implementation for conv_transpose ran pre-commit implemented conv_transpose updated conv_general docstring updated conv_general docstring updated code comments removed commented run_conv_checks updated acknowledgments added missing entry to ops.rst added op to nn.layers resolved merge conflicts * removed ConvolutionTranspose primitive as suggested by reviewer removed ConvolutionTranspose primitive as suggested by reviewer * remove transpose flag, add another test --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-09-06 19:52:38 -07:00
Awni Hannun	ba3e913c7a	Simplifications for MLX C (#1396 ) * simplifications for MLX C * use vectors instead of map * update examples	2024-09-06 19:16:50 -07:00
Awni Hannun	7cca1727af	Fix slice data size (#1394 ) * fix slice data size and add tests * fix contiguous flag * simplify stride and perform copy for non-contiguous arrays * fix cpu * comment	2024-09-04 19:10:43 -07:00
Bhargav Yagnik	11371fe251	Test to prevent bugs like #1386 (#1391 ) * updated test_array for missing ops * formatting changes	2024-09-04 17:24:30 -07:00
Angelos Katharopoulos	969337345f	Fix reduce edge case (#1389 )	2024-09-01 21:37:51 -07:00
Awni Hannun	0d302cd25b	Fix compiel with byte sized constants (#1381 )	2024-08-30 17:24:35 -07:00
Aditya Dhulipala	e6b223df5f	Pinv (#875 )	2024-08-27 23:06:12 -07:00
Angelos Katharopoulos	cdb59faea6	Adds send/recv ops in distributed (#1366 )	2024-08-26 23:01:37 -07:00
Alex Barron	1d94ac3f90	Add optional headers to ``mx.fast.metal_kernel`` (#1358 )	2024-08-26 21:45:45 -07:00
Alex Barron	d1183821a7	int() and float() for mx.array (#1360 )	2024-08-25 20:41:44 -07:00
Angelos Katharopoulos	8081df79be	Fix boolean all reduce bug (#1355 )	2024-08-24 10:09:32 -07:00
Angelos Katharopoulos	b57a52813b	Further reduction tuning (#1349 ) * More reduction tuning * Forgotten pdb * Small column long row specialization	2024-08-23 10:35:25 -07:00
Alex Barron	da8deb2b62	fix bug with multiple attributes (#1348 ) Co-authored-by: Alex Barron <abarron22@apple.com>	2024-08-23 10:06:15 -07:00
Awni Hannun	98b6ce3460	Refactor reductions and fix scatter atomics for large sizes (#1300 ) Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-08-22 16:03:31 -07:00
Alex Barron	0fd2a1f4b0	Custom Metal Kernels from Python (#1325 ) * start * simple kernels working * restructure * inverse example working * docs + fixes * missing file * fix imports * address comments * add docs + fix test * Review comments + refactor to a single function * update docs * remove hashing * fix contig bug in test * back to a class * trailing whitespace * fix tests * match c++ and python apis * add link + make args kw_only	2024-08-22 13:46:29 -07:00
Awni Hannun	d40e76809f	Fix rope (#1340 ) * add test * fix rope * fix test	2024-08-20 17:37:52 -07:00
Awni Hannun	bb1b76d9dc	RoPE with frequencies as optional input (#1337 ) * start rope with freq input * rope with frequencies * nits * fix bug * fix bug + test * cleanup * optional base	2024-08-19 18:30:50 -07:00
Awni Hannun	ae5b5cabfd	Fix optimizer reloading from checkpoint (#1329 ) * fix optimizer reloading from checkpoint * comment	2024-08-15 07:33:23 -07:00
Alex Barron	99bb7d3a58	GPU mx.sign for complex64 (#1326 )	2024-08-14 07:54:53 -07:00
Awni Hannun	eaaea02010	Add `isfinite` (#1318 ) * isfinite * remove reduce test since fix is not complete	2024-08-13 14:49:28 -07:00
Bhargav Yagnik	a098bc92e0	Fix: Preserve input dtype in Dropout layer output (#1323 ) * Fix: Preserve input dtype in Dropout layer output - Modified Dropout implementation to ensure that the output dtype matches the input dtype. - This resolves the issue #1321 * Update test cases in test_nn.py - Revised test cases to align with updated dropout code - Fixed assertion method: replaced self.assertTrue with self.assertEqual for accurate comparisons in test_nn.py -> test_rope, test_alibi and test_dropout, * updated dropout.py	2024-08-13 11:54:21 -07:00
Brian Keene	19fb69e2ed	Add memory_efficient_threshold kwarg to sdpa kernel (#1319 ) Allows opt-in to memory efficient GPU shader at proscribed sequence length. Otherwise, utilizes aggregate MLX primitives for best latency.	2024-08-12 12:57:09 -07:00
Alex Barron	32668a7317	CPU mx.linalg.cholesky_inverse and mx.linalg.tri_inv (#1307 ) * add cholesky inv + tri inv * always run tri_inv on cpu * consistent naming	2024-08-08 15:18:02 -07:00
Angelos Katharopoulos	780c197f95	Fix test tolerance and patch bump (#1315 )	2024-08-08 14:51:09 -07:00
Alex Barron	635ccd9e25	Add "edge" mode to mx.pad (#1309 ) * Add edge padding mode * fix pad in pooling * string arg instead of enum	2024-08-06 11:23:10 -07:00
nicolov	8c9f0278b9	Add vmap to scatter (#1200 ) * Add vmap to scatter * updates * vmap updates + a few more tests * bug fix --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-08-05 20:12:27 -07:00
Awni Hannun	58d0e199e1	add bfloat conv for windograd (#1306 ) * add bfloat conv for windograd * accumulate in fp32 * accumulate in fp32 * accumulate in bf16	2024-08-05 15:51:13 -07:00
Awni Hannun	10b5835501	fix creating array from bf16 tensors in jax / torch (#1305 )	2024-08-01 16:20:51 -07:00
Alex Barron	c52d1600f0	Fused Affine Quantize/Dequantize ops (#1282 ) * Add fast affine dequantize * add full quantize kernel * fused kernel with scale/bias computation * fix docstring * fix no jit error * fix test * test fix * reduce fast api to only affine_quantize	2024-07-29 15:11:38 -07:00
Atakan Tekparmak	6e06e3a904	feat: Added "tanh" option to GELU approximation (#1268 )	2024-07-28 09:07:56 +02:00
Awni Hannun	7b456fd2c0	Array api (#1289 ) * some updates for numpy 2.0 and array api * some updates for numpy 2.0 and array api * fix array api doc	2024-07-26 10:40:49 -07:00
Anton Belov	5029894662	[Issue #1187 ] Add nan_to_num function initial attempt (#1247 ) * initial attempt, working with wrong types * not compiling; mx.float16 and mx.bfloat16 tests added * fix nan to num * nit --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-07-25 09:57:37 -07:00
Awni Hannun	baf9fa5f42	Einsum (#1269 ) * einsum initial * fix comma break * sum axis was wrong * small cleanups * python binding * changed bindings to resemble numpy * remove todo comment * comment changes * add count of operands/inputs * fail fast if operands list is empty * ignore comma if no output * einsum path matching numpy * getting somewhere with path * remove print * it passes the first test * moved einsum tests to seperate file * seperated einsum path * moved einsum naive * remove space from equation * fast fail if no operands passed * update tests and remove printf * small cleanup * some more cleanups * removed python helper file * ack * utilize std for finding min in vector * duplicate def * remove the tuple as it was unreadable * moved einsum_naive back to ops * remaining isn't needed * avoid creating another set * cleanup * greedy path, start of naive einsum * more einsum * fix some bugs * some more fixes, tests pass * benchmark * some simplify * fix einsum and test Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com> * add a bunch more tests and fix a bunch more bugs * some docs nits --------- Co-authored-by: dc-dc-dc <dgcruz983@gmail.com> Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-07-25 09:36:44 -07:00
Jagrit Digani	7f914365fd	Fix GPU sort for large arrays (#1285 ) * Fix GPU sort for large arrays	2024-07-24 14:37:10 -07:00

1 2 3 4 5 ...

331 Commits