zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-07-24 02:41:19 +08:00

Author	SHA1	Message	Date
Awni Hannun	a229c8cef0	don't duplicate malloc with custom kernel init (#1830 )	2025-02-04 13:20:57 -08:00
Awni Hannun	1156c84e86	Refactor common into cpu specific and truly common (#1817 ) * refactor * fix extension example * fix no-cpu	2025-02-03 15:58:02 -08:00
Jesper Stemann Andersen	2d8e667400	MinGW support (#1806 ) * Changed /bin/bash to bash for generating compiling preamble * Fix wrt jit_compiler mingw like msvc wrt. WEXITSTATUS * Solved ambiguity wrt. bernoulli test shape * Disabled distributed/ring on Windows * Fixed jit_compiler command wrt. MinGW * Extended jit_compiler patch wrt. WEXITSTATUS to FreeBSD	2025-02-01 12:40:06 -08:00
Awni Hannun	80c863b972	Remove accelerate/ (#1816 ) * remove accelerate * comments * neon reduction	2025-02-01 07:18:26 -08:00
Angelos Katharopoulos	f5cc1eea72	Allow different value dimensions in sdpa_vector (#1811 )	2025-01-31 20:58:59 -08:00
Awni Hannun	b7c9f1d38f	scatter axis + gather axis primitives (#1813 ) * scatter axis + gather axis primitives * add transforms * comment	2025-01-31 20:48:08 -08:00
Awni Hannun	c6fc07f1f4	Unify CPU matmuls, remove unused accelerate conv (#1814 ) * unify matmuls * Update mlx/backend/common/matmul.cpp Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com> --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2025-01-31 14:43:37 -08:00
Awni Hannun	4758c8baa1	Start to cleanup/unify accelerate and common back-ends (Part 1/N) (#1777 ) * start to cleanup/unify accelerate and common back-ends * more progress * simplify * add half type and allow infs in simd exp * unify softmax + quantized, more dispatches to simd quantized mm * add sin/cos, use simd in vector-scalar ops * faster CPU vectorize quant * faster erf/erfinv	2025-01-29 14:34:49 -08:00
Awni Hannun	e6a7ab9675	non square qr (#1783 )	2025-01-21 14:07:47 -08:00
Angelos Katharopoulos	1f4c127fb9	Move some kernels to `get_template_definition` (#1782 )	2025-01-21 08:59:44 -08:00
Awni Hannun	a4667da1eb	Faster synchronization `Fence` primitive (#1773 ) * try faster synchronization move event fixes update bench fix fix * non-functioning kernel * try alternative fence * cleanup barrier * get rid of event_fence * update benchmarks * doc string in metal fence	2025-01-17 18:42:19 -08:00
Awni Hannun	f288db8d34	Fix synchronization bug for in stream async works (#1768 )	2025-01-15 06:07:34 -08:00
Awni Hannun	252e423e81	fix and cleanup event signal/wait for metal (#1765 )	2025-01-10 18:37:26 -08:00
Alex Barron	c7b0300af5	Fix batched qmv bug (#1758 )	2025-01-09 11:45:57 -08:00
Awni Hannun	da8c885784	Simplify removes no-ops from the tape (#1759 ) * simplify removes no-ops from the tape * comment	2025-01-09 11:23:19 -08:00
Awni Hannun	1ccaf80575	Dynamic broadcasting for shapeless compile/export (#1722 ) * working towards dynamic broadcast * shapeless broadcast * fix build + nits * use broadcast arrays in quantize matmul * some cleanup / consistency * mend * some comments * add vjp, jvp for broadcast axes	2025-01-09 11:04:24 -08:00
Cheng	ec36bfa317	Include command stdout in error message (#1756 ) * Include command stdout in error message * On Windows pclose returns the exit code	2025-01-08 07:17:03 -08:00
Cheng	b8f76f717a	Print exceptions in eval_cpu/eval_gpu and abort (#1754 )	2025-01-08 06:31:09 -08:00
Awni Hannun	d1766f2c70	Add boolean mask support in vector SDPA (#1757 )	2025-01-07 20:24:53 -08:00
Awni Hannun	516ded618b	Dynamic slicing (#1741 ) * dynamic slice and slice update * python bindings + tests + fix set item * fix compile issue * comment * fix jit	2025-01-07 14:02:16 -08:00
Awni Hannun	d5ec172c95	Allow boolean mask in sdpa (#1753 ) * allow boolean mask in sdpa * more permissive donation in ternary	2025-01-06 16:57:07 -08:00
Awni Hannun	058d6ce683	mpi send use input as output (#1750 ) * mpi send use input as output * move earlier	2025-01-06 06:08:43 -08:00
Awni Hannun	259025100e	Fix nd ternary on GPU (#1746 )	2025-01-03 11:52:17 -08:00
Awni Hannun	6fa0501387	Fix concatenate/slice_update vjp + reduce binary size (#1735 ) * fix concatenate vjp + reduce binary size * also cast in slice update	2025-01-02 16:36:33 -08:00
Cheng	935c8c4bb1	Make mx.compile work on Windows (#1697 ) * Invoke MSVC on Windows in mx.compile * Export kernel symbol on MSVC * Remove unused template * Parse env pairs in a robust way * No need of cassert * Remove unnecessary helpers * Fix right trim * Move command building to a separate file * Missing header * Do not pollute cwd with cl.exe * Simplify str concat * Pass output dir * Fix styling	2024-12-24 07:02:33 -08:00
Valentin Roussellet	88f993da38	Explicit parentheses around some logical operators (#1732 ) * fix some warnings * format	2024-12-24 07:02:20 -08:00
Awni Hannun	ebfe64b92d	shapeless slice update and broadcast when possible (#1727 )	2024-12-23 11:25:15 -08:00
Awni Hannun	0308e9af71	Allow offset to be an mx.array for `mx.fast.rope` (#1724 ) * allow offset for rope * comment	2024-12-19 15:51:44 -08:00
Awni Hannun	e03f0372b1	More shape type (#1705 ) * more shape type * fix	2024-12-19 08:08:20 -08:00
Awni Hannun	7480059306	track resource limit and throw if exceeded (#1718 )	2024-12-18 18:45:58 -08:00
Cheng	070bd433ab	Shorter kernel name for Windows (#1701 ) * Shorter kernel name for Windows * Only hash the clipped part	2024-12-17 18:51:38 -08:00
Awni Hannun	9111999af3	Fix small sort with metal validation (#1695 )	2024-12-12 09:21:45 -08:00
Awni Hannun	6bd28d246e	Allow no copy negative strides in as_strided and slice (#1688 ) * allow no copy negative strides in as_strided and slice * fix jit * fix jit	2024-12-12 08:59:45 -08:00
Cheng	4d595a2a39	Make compiled preamble work in MSVC (#1675 ) * Make compiled preamble work in MSVC * Remove logging * Only use powershell for MSVC	2024-12-12 08:55:49 -08:00
Awni Hannun	4e1e9520e1	Flatten and unflatten (#1692 ) * flatten and unflatten * fix grad * fix shape infer * use squeeze + unsqueeze in get_item	2024-12-11 21:51:37 -08:00
Awni Hannun	f3dfa36a3a	Fix x86 tests (#1691 ) * fix x86 tests * comment	2024-12-11 07:47:18 -08:00
Awni Hannun	f76a49e555	`ExpandDims` primitive (#1687 ) * add squeeze primitive * simplify squeeze, use in gather * fix * fix * fix * fix * fix no cpu * use squeeze in matmul and friends * expand dims primitive * comment	2024-12-10 16:39:07 -08:00
Awni Hannun	40c62c1321	Use int64 stride everywhere (#1671 ) * use int64 stride everywhere * fix ext * fix ext * more shape + cleanup * one more * few more	2024-12-09 11:09:02 -08:00
Cheng	6ae5423b4a	Do not pass integers to isnan (#1664 )	2024-12-07 18:26:23 -08:00
Cheng	3ceb341a75	Use correct complex type for MSVC (#1660 )	2024-12-07 18:25:22 -08:00
Alex Barron	95c4a2e3af	add back conditionaltype (#1655 )	2024-12-06 11:12:01 -08:00
Jagrit Digani	9d40e521d7	Stop matrix copies with new attention kernel (#1639 )	2024-12-02 14:12:38 -08:00
Jesper Stemann Andersen	e4eeb4e910	Added missing unordered_map includes (#1635 ) * Added missing includes in mlx/io.h and mlx/backend/metal/metal.h * Added additional missing unordered_map includes that fixes build on FreeBSD	2024-12-02 07:03:03 -08:00
Ikko Eltociear Ashimine	9bc2183a31	docs: update device.cpp (#1632 ) unecessary -> unnecessary	2024-11-27 20:58:26 -08:00
Awni Hannun	d4b222b6d3	Fix some leaks and races (#1629 ) * fix leak and fix potential race * more leak fixes * fix one more	2024-11-27 20:01:20 -08:00
Jesper Stemann Andersen	af2af818a6	Enables build for -linux-musl (#1627 ) Also contributes to being able to build for -w64-mingw32. Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761	2024-11-27 13:14:24 -08:00
Awni Hannun	211411faf2	fix large ops (#1620 )	2024-11-24 09:17:10 -08:00
Alex Barron	6f7986d592	Cleaner `qmv`/`qvm` (#1616 )	2024-11-22 11:14:08 -08:00
Jagrit Digani	02bec0bb6d	Matrix Attention kernel (#1610 ) * Rough INIT * [WIP]: Loading and Matmuls added * [WIP]: Reductions and min working aligned kernel at headdim = 64 * [WIP] Added headdim 80 for testing * [WIP] Update dispatch params for testing * [WIP] Add support for unaligned seq lengths - still looks messy * Update sdpa_benchmarks * Update sdpa_benchmarks * Update sdpa_benchmarks * Enable gqa support * Update benchmark and switch off 128 headdim * Update headdim 128 tuning * Remove older fast attention code. Write out O strided * Disable hd=128 until further optimizations * Enable bf16 * Fix data size bug * Enable attn build outside of jit	2024-11-22 10:34:05 -08:00
Alex Barron	c79f6a4a8c	3 and 6 bit quantization (#1613 ) * Support 3 and 6 bit quantization	2024-11-22 10:22:13 -08:00
Awni Hannun	0c5eea226b	Reduce specializations (#1607 ) * start of reduce specializations * fix all reduce * fix many dims * fix * non-jit tests clear * cleanup instantiations * cpu merges * change dim specializations * optimize * fix jit * fix jit * use higher precision for integer sum+prod * fixes	2024-11-21 19:53:00 -08:00
Awni Hannun	dcca0d7477	contiguous op / prim (#1612 )	2024-11-21 19:51:49 -08:00
Awni Hannun	61d787726a	Fix view scalar bug segfault (#1603 ) * fix view scalar bug * fix view scalar bug * one more fix	2024-11-19 10:54:05 -08:00
Awni Hannun	2419edd5b2	Faster indexing math in a few kernels (#1589 ) * wip: faster compiled kernels * faster general unary with uint specialization * index type in compiled, unary, binary, ternary, copy * fix jit * jit fix * specialize gather + scatter * nit in docs	2024-11-18 19:52:00 -08:00
Awni Hannun	9d7fa6b8e6	Use osx deployment target to pick Metal version (#1595 ) * choose metal based on deployment target rather than system version * nit * unused compile def	2024-11-18 19:16:49 -08:00
Angelos Katharopoulos	073076ac7d	2-Pass Sdpa Inference Kernel (#1597 )	2024-11-18 17:31:53 -08:00
Awni Hannun	9bd03dd9b4	More buffer donation with no-ops (#1591 ) * more donation * fix test * fix build	2024-11-18 08:35:41 -08:00
Awni Hannun	6931f84412	fix dispatch threads for a few kernels (#1594 )	2024-11-18 08:35:25 -08:00
Awni Hannun	610af352d4	Dispatch bf16 at run time when using the JIT (#1584 ) * Dispatch bf16 at run time when using the JIT * fix extension * fix extension build * fix extension build * Update utils.h	2024-11-15 16:54:36 -08:00
Awni Hannun	b35f1e3c9c	fix donation in sdpa (#1587 )	2024-11-13 17:21:13 -08:00
Awni Hannun	dfa0b9aab4	Cpu fast quantize (#1578 ) * cpu quantize * fix	2024-11-08 20:10:39 -08:00
Alex Barron	a4c47b0276	OOB QMV fix (#1579 ) * fix oob access in qmv * skip more * fix small case	2024-11-08 17:59:45 -08:00
Alex Barron	111fefd5e9	Fix OOB access in qmv (#1577 ) * fix oob access in qmv * skip more	2024-11-08 15:41:30 -08:00
Awni Hannun	c1fe1ef081	Bfs width limit (#1568 ) * width limit * fix * large limit * put env vars in env namespace	2024-11-08 15:00:46 -08:00
Awni Hannun	9f0d5c12fc	Fully wrap the command encoder (#1572 ) * fully wrap the command encoder * use consistent style + fix extensions	2024-11-08 11:50:21 -08:00
Awni Hannun	9a3842a2d9	fix (#1566 )	2024-11-06 17:10:33 -08:00
Alex Barron	26be608470	Add split_k `qvm` for long context (#1564 ) * Add splitk qvm * configurable splitk * tuning * remove extra instantiation * remove refactor * separate test * cpu tolerance	2024-11-05 11:25:19 -08:00
Angelos Katharopoulos	248431eb3c	Reductions update (#1351 )	2024-11-04 22:25:16 -08:00
Awni Hannun	f1951d6cce	Use fewer barriers (#1561 ) * use fewer barriers * comment	2024-11-04 10:26:49 -08:00
Angelos Katharopoulos	62f297b51d	Sdpa fix (#1558 )	2024-11-02 21:25:46 -07:00
Awni Hannun	4f72c66911	improvements to scatter / gather (#1541 )	2024-10-30 19:30:54 -07:00
Jagrit Digani	960e3f0f05	Gemm update (#1518 )	2024-10-30 19:30:28 -07:00
Awni Hannun	884af42da2	Fix thread group for large arrays (#1543 ) * fix thread group for large arrays * comment * one more	2024-10-30 16:25:12 -07:00
Carlo Cabrera	1a992e31e8	Skip using Residency sets in VMs (#1537 ) * Skip using Residency sets in VMs Attempting to use residency sets in a VM throws[^1] libc++abi: terminating due to uncaught exception of type std::runtime_error: [metal::Device] Unable to construct residency set. Not quite sure if this is the best fix, but it does make the error go away. Note that it was previously possible to run simple programs that used mlx in a VM prior to `0eb56d5be0`. See related discussion at Homebrew/homebrew-core#195627. [^1]: https://github.com/Homebrew/homebrew-core/actions/runs/11525831492/job/32105148462#step:3:56 Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * change residency check --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-10-29 19:37:23 -07:00
Awni Hannun	015c247393	change wino dispatch conditoin (#1534 )	2024-10-28 11:13:44 -07:00
Awni Hannun	d3cd26820e	Faster bits and bernoulli (#1535 ) * faster bits and bernoulli * fix bernoulli	2024-10-28 11:11:00 -07:00
Awni Hannun	0eb56d5be0	Wired (#1510 ) * expose residency sets as wire/unwire * returns wired size * fix * runtime support check * fix os check * fix test * fix no metal build * docs * nit * nits in docs * nits	2024-10-25 09:35:33 -07:00
Awni Hannun	dad1b00b13	fix (#1523 )	2024-10-24 19:17:46 -07:00
Angelos Katharopoulos	c9b41d460f	Working 64-bit scans (#1506 )	2024-10-24 11:05:46 -07:00
xnorai	32972a5924	C++20 compatibility for fmt (#1519 ) * C++20 compatibility for fmt * Address review feedback * Remove stray string * Add newlines back	2024-10-24 08:54:51 -07:00
Dhruv Govil	f6afb9c09b	Remove use of vector<const T> (#1514 )	2024-10-22 16:31:52 -07:00
Kashif Rasul	3ddc07e936	Eigenvalues and eigenvectors (#1334 ) * initial eigvalsh * add compute_vectors * add compute_vectors_ * return a pair * add eigh to return only eigenvectors * fixed typo * merge merge Eighvalsh and Eigh into a single primitive * use the same primate with the flag * fix primatives * use MULTI * fix eval_gpu * fix decleration * rename EighPrimitive to Eigh * tests * tests * fix rebase and format * cleanup lapack * format * add cblas.h --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-10-22 12:18:48 -07:00
Awni Hannun	c26208f67d	Remove Hazard tracking with Fences (#1509 ) * remove hazard tracking * with fence map * no hazard tracking with fences * nits * fix fence retain * cleanup * fix quantized rebase	2024-10-21 19:33:32 -07:00
Alex Barron	d15fa13daf	Batched Quantized Matmul + Fast Small QMV (#1503 ) * add fast qmv for small dims * fix test * batched cpu * add batched template param * refactor metal quantized.cpp	2024-10-21 16:23:17 -07:00
Awni Hannun	92d7cb71f8	Fix compile (#1501 ) * fix compile * fix space	2024-10-18 11:06:40 -07:00
Angelos Katharopoulos	50d8bed468	Fused attention for single query (#1497 )	2024-10-18 00:58:52 -07:00
Awni Hannun	3f86399922	Real and Imag (#1490 ) * real and imag * fix * fix	2024-10-15 16:23:15 -07:00
Awni Hannun	0ab8e099e8	Fix cpu segfault (#1488 ) * fix cpu segfault * nit in tests	2024-10-14 16:17:03 -07:00
Awni Hannun	020f048cd0	A few updates for CPU (#1482 ) * some updates * format * fix * nit	2024-10-14 12:45:49 -07:00
Awni Hannun	881615b072	Faster metal compiled kernels + some fixes (#1486 ) * bump mac tests to use py39 * work per thread for compiled kernels * fixe for large arrays * fix	2024-10-14 12:45:38 -07:00
Awni Hannun	bf6ec92216	Make the GPU device more thread safe (#1478 ) * gpu stream safety * comment * fix	2024-10-12 17:49:15 -07:00
Awni Hannun	1fa0d20a30	consistently handle all -inf in softmax (#1470 )	2024-10-08 09:54:02 -07:00
Awni Hannun	3274c6a087	Fix array is_available race cases (#1468 )	2024-10-07 19:13:50 -07:00
Awni Hannun	95d04805b3	Fix complex power on Metal (#1460 )	2024-10-06 19:58:30 -07:00
Awni Hannun	e4534dac17	Conv grad with groups + bugfix (#1449 ) * fix bug in flipped conv with groups, start of grad for groups * fix * fix * fix + test	2024-10-06 07:08:53 -07:00
Awni Hannun	1bdc038bf9	fix argpartition + faster {arg} sorts / partitions (#1453 )	2024-10-03 14:21:25 -07:00
Awni Hannun	5523d9c426	faster cpu indexing (#1450 )	2024-10-03 13:53:47 -07:00
Angelos Katharopoulos	d878015228	Fix normalization check_input (#1452 )	2024-10-03 13:26:56 -07:00
Angelos Katharopoulos	bacced53d3	Fix row reduce with very few rows (#1447 )	2024-09-29 20:00:35 -07:00
Awni Hannun	11354d5bff	Avoid io timeout for large arrays (#1442 )	2024-09-27 13:32:14 -07:00
Awni Hannun	5b6f38df2b	Faster cpu ops (#1434 ) * faster binary and cleaner copy * use recursive template for other ops * more cleanup * fix from cleanup * more clean * fix binary * use contiguous iterator * add 3d * nits * fix * fix? * fix * fix rebase	2024-09-26 09:19:13 -07:00
Awni Hannun	0b4a58699e	Some overhead reductions in mx.fast.metal_kernel (#1437 ) * some overhead reductions * fix * use += * use more +=	2024-09-25 17:25:21 -07:00
Awni Hannun	4f9f9ebb6f	Faster Metal unary and binary for general case (#1431 ) * faster unary and binary for general case * update ternary + jit fix * fix jit * unary work per thread	2024-09-25 12:07:43 -07:00
Awni Hannun	67b6bf530d	Optimization for general ND copies (#1421 )	2024-09-17 17:59:51 -07:00
Awni Hannun	4f46e9c997	More fixes for arrays with large sizes (#1405 ) * compile works for big arrays when contiguous * style * nits in docs * a bunch more stuff * update jit * update jit * use constant for shapes and strides and remove elem_to_loc overload * use kernel instantiation * docs nits * update binary and ternary * comments	2024-09-17 12:46:31 -07:00
Nripesh Niketan	669c27140d	Chore: add pre-commit hook for cmake (#1362 ) * reset and lint * format --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-09-16 12:53:01 -07:00
Max-Heinrich Laves	adcc88e208	Conv cpu improvements (#1410 )	2024-09-15 18:45:10 -07:00
Awni Hannun	b3f52c9fbe	ensure io/comm streams are active before eval (#1412 )	2024-09-14 06:17:36 -07:00
Angelos Katharopoulos	881f09b2e2	Allow querying the allocator for the buffer size (#1404 )	2024-09-11 21:02:16 -07:00
Awni Hannun	02efb310ca	Xcode 160 (#1384 ) * xcode 16.0 with debug tests * limit nproc for builds * vmap bug * assert bug * run python tests in debug mode * fix view, bool copies preserve bits' * actual view fix	2024-09-10 15:15:17 -07:00
Awni Hannun	e7e59c6f05	Fix copying scalars by adding fill_gpu (#1402 ) * fix copying scalars by adding fill_gpu * Another copy scalar changed to fill --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-09-09 15:54:08 -07:00
Max-Heinrich Laves	efeb9c0f02	Transposed Convolution (#1245 ) * initial implementation for conv_transpose ran pre-commit implemented conv_transpose updated conv_general docstring updated conv_general docstring updated code comments removed commented run_conv_checks updated acknowledgments added missing entry to ops.rst added op to nn.layers resolved merge conflicts * removed ConvolutionTranspose primitive as suggested by reviewer removed ConvolutionTranspose primitive as suggested by reviewer * remove transpose flag, add another test --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-09-06 19:52:38 -07:00
Awni Hannun	7cca1727af	Fix slice data size (#1394 ) * fix slice data size and add tests * fix contiguous flag * simplify stride and perform copy for non-contiguous arrays * fix cpu * comment	2024-09-04 19:10:43 -07:00
Awni Hannun	41c603d48a	fix jit reduce (#1395 )	2024-09-04 14:03:10 -07:00
Angelos Katharopoulos	969337345f	Fix reduce edge case (#1389 )	2024-09-01 21:37:51 -07:00
Angelos Katharopoulos	58dca7d846	Fix copy in the sort primitive (#1383 )	2024-08-31 08:32:14 -07:00
Awni Hannun	0d302cd25b	Fix compiel with byte sized constants (#1381 )	2024-08-30 17:24:35 -07:00
Alex Barron	da691257ec	Fix overflow in quantize/dequantize (#1379 ) * add 2d indices to prevent overflow * use nthreads not out size	2024-08-30 13:32:41 -07:00
Awni Hannun	dba2bd1105	Even Even Faster IO (#1374 ) * even more faster io * make reader pool static * make python reader thread safe * one more optimization	2024-08-29 16:05:40 -07:00
Alex Barron	28be4de7c2	Fix JIT reductions (#1373 )	2024-08-28 16:39:11 -07:00
Awni Hannun	a6c3b38fba	Async load (#1372 ) * async load * async load	2024-08-28 14:21:55 -07:00
Awni Hannun	fcb65a3897	Even Faster I/O (#1369 ) * try multithreading for faster IO * smaller batch size * Account for pread returning less than size * nit --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-08-28 11:49:07 -07:00
Jeethu Rao	bd47e1f066	Fix neon_fast_exp and add more softmax tests (#1367 )	2024-08-27 23:42:42 -07:00
Angelos Katharopoulos	cdb59faea6	Adds send/recv ops in distributed (#1366 )	2024-08-26 23:01:37 -07:00
Awni Hannun	5f7d19d1f5	MPI ops in GPU stream for faster comms (#1356 )	2024-08-26 15:12:50 -07:00
Awni Hannun	2fdf9eb535	Fix ternary for large arrays (#1359 ) * fix ternary for large arrays * fix	2024-08-26 11:22:27 -07:00
Awni Hannun	860d3a50d7	fix extension metal library finding (#1361 )	2024-08-26 09:18:50 -07:00
Angelos Katharopoulos	8081df79be	Fix boolean all reduce bug (#1355 )	2024-08-24 10:09:32 -07:00
Nripesh Niketan	64bec4fad7	Chore: update pre-commit hooks (#1353 ) * Chore: update pre-commit refs * run pre-commit	2024-08-24 06:46:36 -07:00
Alex Barron	b96e105244	Add `grid_sample` example to `metal_kernel` docs (#1352 ) * Add `zero_outputs` and `atomic_outputs` options to `metal_kernel` * add grid sample to docs * zero_outputs -> init_value * add missing header for linux	2024-08-23 18:24:16 -07:00
Angelos Katharopoulos	b57a52813b	Further reduction tuning (#1349 ) * More reduction tuning * Forgotten pdb * Small column long row specialization	2024-08-23 10:35:25 -07:00
Awni Hannun	98b6ce3460	Refactor reductions and fix scatter atomics for large sizes (#1300 ) Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-08-22 16:03:31 -07:00
Alex Barron	0fd2a1f4b0	Custom Metal Kernels from Python (#1325 ) * start * simple kernels working * restructure * inverse example working * docs + fixes * missing file * fix imports * address comments * add docs + fix test * Review comments + refactor to a single function * update docs * remove hashing * fix contig bug in test * back to a class * trailing whitespace * fix tests * match c++ and python apis * add link + make args kw_only	2024-08-22 13:46:29 -07:00
Awni Hannun	df3233454d	2d gather specialization (#1339 )	2024-08-22 10:48:24 -07:00
Awni Hannun	d40e76809f	Fix rope (#1340 ) * add test * fix rope * fix test	2024-08-20 17:37:52 -07:00
Awni Hannun	bb1b76d9dc	RoPE with frequencies as optional input (#1337 ) * start rope with freq input * rope with frequencies * nits * fix bug * fix bug + test * cleanup * optional base	2024-08-19 18:30:50 -07:00
Angelos Katharopoulos	9d26441224	Fix contiguity check (#1336 ) Co-authored-by: Alex Barron <abarron22@apple.com>	2024-08-19 16:05:06 -07:00
Awni Hannun	f12f24a77c	fix compiling with space in paths (#1332 )	2024-08-15 16:39:24 -07:00
Awni Hannun	d0630ffe8c	Read arrays from files faster (#1330 ) * read faster * faster write as well * set default permission for linux * comment	2024-08-14 20:09:56 -07:00
Alex Barron	99bb7d3a58	GPU mx.sign for complex64 (#1326 )	2024-08-14 07:54:53 -07:00
Awni Hannun	eaaea02010	Add `isfinite` (#1318 ) * isfinite * remove reduce test since fix is not complete	2024-08-13 14:49:28 -07:00
Alex Barron	32668a7317	CPU mx.linalg.cholesky_inverse and mx.linalg.tri_inv (#1307 ) * add cholesky inv + tri inv * always run tri_inv on cpu * consistent naming	2024-08-08 15:18:02 -07:00
Awni Hannun	30bbea2f08	Add gemv masked to JIT plus some fixes (#1310 ) * add gemv masked to JIT plus some fixes * some cleanup * add utils * fix * fix 2 * more cleaning * fix * remove unused mps matmul support * one more nit * revert	2024-08-07 13:38:07 -07:00
Awni Hannun	58d0e199e1	add bfloat conv for windograd (#1306 ) * add bfloat conv for windograd * accumulate in fp32 * accumulate in fp32 * accumulate in bf16	2024-08-05 15:51:13 -07:00
Awni Hannun	43ffdab172	fix rope and random (#1301 ) * fix rope and random * comment	2024-07-31 16:18:25 -07:00
Awni Hannun	40b6d67333	Fixes for large arrays with a few ops (#1299 ) * fixes for large arrays with a few ops * fix bug * fix all of copy	2024-07-30 17:18:39 -07:00
Alex Barron	c52d1600f0	Fused Affine Quantize/Dequantize ops (#1282 ) * Add fast affine dequantize * add full quantize kernel * fused kernel with scale/bias computation * fix docstring * fix no jit error * fix test * test fix * reduce fast api to only affine_quantize	2024-07-29 15:11:38 -07:00
Jagrit Digani	7f914365fd	Fix GPU sort for large arrays (#1285 ) * Fix GPU sort for large arrays	2024-07-24 14:37:10 -07:00
Alex Barron	c34a5ae7f7	Fix bfloat16 Hadamard (#1283 ) * fix bfloat16 hadamard * add scale * review comments --------- Co-authored-by: Alex Barron <abarron22@apple.com>	2024-07-23 14:54:43 -07:00
Awni Hannun	e2aa6ec8ae	some fixes (#1281 )	2024-07-23 11:49:05 -07:00

1 2 3 4 5 ...

452 Commits