zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-12-16 01:49:05 +08:00

Author	SHA1	Message	Date
Awni Hannun	d6492b0163	fix clip (#1415 )	2024-09-14 16:09:09 -07:00
Awni Hannun	b3f52c9fbe	ensure io/comm streams are active before eval (#1412 )	2024-09-14 06:17:36 -07:00
Angelos Katharopoulos	881f09b2e2	Allow querying the allocator for the buffer size (#1404 )	2024-09-11 21:02:16 -07:00
Awni Hannun	02efb310ca	Xcode 160 (#1384 ) * xcode 16.0 with debug tests * limit nproc for builds * vmap bug * assert bug * run python tests in debug mode * fix view, bool copies preserve bits' * actual view fix	2024-09-10 15:15:17 -07:00
Awni Hannun	e7e59c6f05	Fix copying scalars by adding fill_gpu (#1402 ) * fix copying scalars by adding fill_gpu * Another copy scalar changed to fill --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-09-09 15:54:08 -07:00
Awni Hannun	3ae6aabe9f	throw for certain cases of non captured inputs in compile (#1401 )	2024-09-09 14:54:31 -07:00
xnorai	dc627dcb5e	Replace the use of `result_of_t` with `invoke_result_t` (#1397 ) * Fix C++20 incompatibility * Fix C++20 incompatibility	2024-09-06 19:52:57 -07:00
Max-Heinrich Laves	efeb9c0f02	Transposed Convolution (#1245 ) * initial implementation for conv_transpose ran pre-commit implemented conv_transpose updated conv_general docstring updated conv_general docstring updated code comments removed commented run_conv_checks updated acknowledgments added missing entry to ops.rst added op to nn.layers resolved merge conflicts * removed ConvolutionTranspose primitive as suggested by reviewer removed ConvolutionTranspose primitive as suggested by reviewer * remove transpose flag, add another test --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-09-06 19:52:38 -07:00
Awni Hannun	ba3e913c7a	Simplifications for MLX C (#1396 ) * simplifications for MLX C * use vectors instead of map * update examples	2024-09-06 19:16:50 -07:00
Awni Hannun	7cca1727af	Fix slice data size (#1394 ) * fix slice data size and add tests * fix contiguous flag * simplify stride and perform copy for non-contiguous arrays * fix cpu * comment	2024-09-04 19:10:43 -07:00
Awni Hannun	41c603d48a	fix jit reduce (#1395 )	2024-09-04 14:03:10 -07:00
Angelos Katharopoulos	969337345f	Fix reduce edge case (#1389 )	2024-09-01 21:37:51 -07:00
Angelos Katharopoulos	58dca7d846	Fix copy in the sort primitive (#1383 )	2024-08-31 08:32:14 -07:00
Awni Hannun	0d302cd25b	Fix compiel with byte sized constants (#1381 )	2024-08-30 17:24:35 -07:00
Alex Barron	da691257ec	Fix overflow in quantize/dequantize (#1379 ) * add 2d indices to prevent overflow * use nthreads not out size	2024-08-30 13:32:41 -07:00
Awni Hannun	dba2bd1105	Even Even Faster IO (#1374 ) * even more faster io * make reader pool static * make python reader thread safe * one more optimization	2024-08-29 16:05:40 -07:00
Alex Barron	28be4de7c2	Fix JIT reductions (#1373 )	2024-08-28 16:39:11 -07:00
Awni Hannun	a6c3b38fba	Async load (#1372 ) * async load * async load	2024-08-28 14:21:55 -07:00
Awni Hannun	fcb65a3897	Even Faster I/O (#1369 ) * try multithreading for faster IO * smaller batch size * Account for pread returning less than size * nit --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-08-28 11:49:07 -07:00
Jeethu Rao	bd47e1f066	Fix neon_fast_exp and add more softmax tests (#1367 )	2024-08-27 23:42:42 -07:00
Aditya Dhulipala	e6b223df5f	Pinv (#875 )	2024-08-27 23:06:12 -07:00
Angelos Katharopoulos	e64349bbdd	Make eval just wait if all arrays are scheduled (#1368 )	2024-08-27 17:01:22 -07:00
Angelos Katharopoulos	cdb59faea6	Adds send/recv ops in distributed (#1366 )	2024-08-26 23:01:37 -07:00
Alex Barron	1d94ac3f90	Add optional headers to ``mx.fast.metal_kernel`` (#1358 )	2024-08-26 21:45:45 -07:00
Awni Hannun	5f7d19d1f5	MPI ops in GPU stream for faster comms (#1356 )	2024-08-26 15:12:50 -07:00
Awni Hannun	2fdf9eb535	Fix ternary for large arrays (#1359 ) * fix ternary for large arrays * fix	2024-08-26 11:22:27 -07:00
Awni Hannun	860d3a50d7	fix extension metal library finding (#1361 )	2024-08-26 09:18:50 -07:00
Angelos Katharopoulos	8081df79be	Fix boolean all reduce bug (#1355 )	2024-08-24 10:09:32 -07:00
Nripesh Niketan	64bec4fad7	Chore: update pre-commit hooks (#1353 ) * Chore: update pre-commit refs * run pre-commit	2024-08-24 06:46:36 -07:00
Alex Barron	b96e105244	Add `grid_sample` example to `metal_kernel` docs (#1352 ) * Add `zero_outputs` and `atomic_outputs` options to `metal_kernel` * add grid sample to docs * zero_outputs -> init_value * add missing header for linux	2024-08-23 18:24:16 -07:00
Angelos Katharopoulos	b57a52813b	Further reduction tuning (#1349 ) * More reduction tuning * Forgotten pdb * Small column long row specialization	2024-08-23 10:35:25 -07:00
Alex Barron	da8deb2b62	fix bug with multiple attributes (#1348 ) Co-authored-by: Alex Barron <abarron22@apple.com>	2024-08-23 10:06:15 -07:00
Awni Hannun	98b6ce3460	Refactor reductions and fix scatter atomics for large sizes (#1300 ) Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-08-22 16:03:31 -07:00
Alex Barron	0fd2a1f4b0	Custom Metal Kernels from Python (#1325 ) * start * simple kernels working * restructure * inverse example working * docs + fixes * missing file * fix imports * address comments * add docs + fix test * Review comments + refactor to a single function * update docs * remove hashing * fix contig bug in test * back to a class * trailing whitespace * fix tests * match c++ and python apis * add link + make args kw_only	2024-08-22 13:46:29 -07:00
Awni Hannun	df3233454d	2d gather specialization (#1339 )	2024-08-22 10:48:24 -07:00
Awni Hannun	8ae751d3da	fix io (#1343 ) * fix io * fix io * comment	2024-08-21 13:14:46 -07:00
Awni Hannun	d40e76809f	Fix rope (#1340 ) * add test * fix rope * fix test	2024-08-20 17:37:52 -07:00
Awni Hannun	bb1b76d9dc	RoPE with frequencies as optional input (#1337 ) * start rope with freq input * rope with frequencies * nits * fix bug * fix bug + test * cleanup * optional base	2024-08-19 18:30:50 -07:00
Angelos Katharopoulos	9d26441224	Fix contiguity check (#1336 ) Co-authored-by: Alex Barron <abarron22@apple.com>	2024-08-19 16:05:06 -07:00
Awni Hannun	f12f24a77c	fix compiling with space in paths (#1332 )	2024-08-15 16:39:24 -07:00
Awni Hannun	d0630ffe8c	Read arrays from files faster (#1330 ) * read faster * faster write as well * set default permission for linux * comment	2024-08-14 20:09:56 -07:00
Alex Barron	99bb7d3a58	GPU mx.sign for complex64 (#1326 )	2024-08-14 07:54:53 -07:00
Awni Hannun	eaaea02010	Add `isfinite` (#1318 ) * isfinite * remove reduce test since fix is not complete	2024-08-13 14:49:28 -07:00
Brian Keene	19fb69e2ed	Add memory_efficient_threshold kwarg to sdpa kernel (#1319 ) Allows opt-in to memory efficient GPU shader at proscribed sequence length. Otherwise, utilizes aggregate MLX primitives for best latency.	2024-08-12 12:57:09 -07:00
Alex Barron	32668a7317	CPU mx.linalg.cholesky_inverse and mx.linalg.tri_inv (#1307 ) * add cholesky inv + tri inv * always run tri_inv on cpu * consistent naming	2024-08-08 15:18:02 -07:00
Angelos Katharopoulos	eb8819e91e	Revert variance to be numerically stable (#1314 )	2024-08-08 13:35:02 -07:00
Awni Hannun	30bbea2f08	Add gemv masked to JIT plus some fixes (#1310 ) * add gemv masked to JIT plus some fixes * some cleanup * add utils * fix * fix 2 * more cleaning * fix * remove unused mps matmul support * one more nit * revert	2024-08-07 13:38:07 -07:00
Alex Barron	635ccd9e25	Add "edge" mode to mx.pad (#1309 ) * Add edge padding mode * fix pad in pooling * string arg instead of enum	2024-08-06 11:23:10 -07:00
nicolov	8c9f0278b9	Add vmap to scatter (#1200 ) * Add vmap to scatter * updates * vmap updates + a few more tests * bug fix --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-08-05 20:12:27 -07:00
Awni Hannun	58d0e199e1	add bfloat conv for windograd (#1306 ) * add bfloat conv for windograd * accumulate in fp32 * accumulate in fp32 * accumulate in bf16	2024-08-05 15:51:13 -07:00

... 4 5 6 7 8 ...

636 Commits