zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-08-17 15:46:43 +08:00

Author	SHA1	Message	Date
Cheng	24f89173d1	CUDA backend: matmul (#2241 )	2025-06-06 12:24:04 -07:00
Awni Hannun	c6a20b427a	Improve metal elementwise kernels (#2247 ) * improve metal elementwise kernels * compile and copy * fix jit	2025-06-06 11:37:40 -07:00
Cheng	52dc8c8cd5	Add profiler annotations in common primitives for CUDA backend (#2244 )	2025-06-04 19:55:12 -07:00
Cheng	85a8beb5e4	Avoid atomic updates across CPU/GPU in CUDA event (#2231 )	2025-06-03 16:49:06 -07:00
Cheng	0bb89e9e5f	Share more common code in Compiled (#2240 ) * Share more common code in Compiled * Remove build_lib_name	2025-06-03 16:48:50 -07:00
Cheng	5685ceb3c7	Avoid invoking allocator::malloc when creating CUDA event (#2232 )	2025-06-03 16:48:40 -07:00
Cheng	1b021f6984	Fast primitives decide when to use the fallback (#2216 )	2025-06-02 13:26:37 -07:00
Cheng	db5a7c6192	Add memory cache to CUDA backend (#2221 ) * Move BufferCache out of allocator * Add memory cache to cuda backend allocator * Simplify BufferCache assuming buf can not be null	2025-05-30 12:12:54 -07:00
Awni Hannun	6ef2f67e7f	5bit quants (#2226 ) * 5bit quants * 5bit quants	2025-05-30 12:12:10 -07:00
Cheng	f76ee1ffd2	Move some dims utils to common (#2223 )	2025-05-29 06:48:30 -07:00
Cheng	54a71f270a	Remove unused defines (#2217 )	2025-05-23 06:14:58 -07:00
Cheng	79071bfba4	Fix out-of-bounds default value in logsumexp/softmax (#2213 )	2025-05-21 07:25:16 -07:00
Cheng	7774b87cbd	Remove redundant simd_sum in logsumexp (#2210 )	2025-05-21 07:25:03 -07:00
Cheng	35c87741cf	Build for compute capability 70 instead of 75 (#2209 )	2025-05-20 19:42:48 -07:00
Awni Hannun	eebe73001a	fix large arg reduce (#2206 )	2025-05-19 13:10:44 -07:00
Cheng	237f9e58a8	Fix BEFORE keyword in target_include_directories (#2204 )	2025-05-19 06:10:44 -07:00
Awni Hannun	8576e6fe36	fix conv2d bug + faster conv 1d (#2195 ) * fix conv2d bug + faster conv 1d * revert sort + flaky test	2025-05-18 06:05:11 -07:00
Angelos Katharopoulos	0654543dcc	Add complex eigh (#2191 )	2025-05-18 00:18:43 -07:00
Cheng	7d4b378952	Include cuda_bf16.h for bfloat16 overloads (#2192 ) * Include cuda_bf16.h for bfloat16 overloads * Add NO_GPU_MULTI(Eig) in cuda backend	2025-05-16 06:44:42 -07:00
Jack Wind	7ff5c41e06	Add set_threadgroup_memory_length to CommandEncoder (#2183 )	2025-05-16 00:28:03 -07:00
Awni Hannun	c1eb9d05d9	non-symmetric eig and eigh (#2188 )	2025-05-15 13:01:44 -07:00
Cheng	0751263dec	Fix typo in row_reduce_small (#2179 )	2025-05-13 20:19:54 -07:00
Cheng	eca2f3eb97	Add remove_index utility (#2173 )	2025-05-13 17:09:56 -07:00
Awni Hannun	8f3d208dce	Close a couple edge case bugs: hadamard and addmm on empty inputs (#2177 ) * handle hadamard and addmm on empty inputs * fix	2025-05-12 10:48:57 -07:00
Awni Hannun	6661387066	Fix fft for integer overflow (#2161 )	2025-05-09 14:25:12 -07:00
ATurker	a7fae8a176	fix: conv_general differences between gpu, cpu (#2070 ) * fix general_conv padding * fix bugs * add test --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-05-09 10:26:52 -07:00
Cheng	0cae0bdac8	CUDA backend: backbone (#2075 )	2025-05-06 21:26:46 -07:00
Awni Hannun	5a1a5d5ed1	fix input coherent kernel launch (#2153 )	2025-05-05 17:30:50 -07:00
Cheng	1683975acf	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
Awni Hannun	af705590ac	fix batched vector sdpa (#2152 )	2025-05-05 13:13:03 -07:00
Awni Hannun	825124af8f	fix bw for elementwise ops (#2151 ) * fix bw for elementwise ops * add compile * fix * fix * fix * fix	2025-05-05 06:15:04 -07:00
Angelos Katharopoulos	481349495b	GPU Hadamard for large N (#1879 )	2025-05-01 17:19:17 -07:00
Awni Hannun	e496c5a4b4	fix integer overflow in qmm (#2143 )	2025-04-30 09:28:56 -07:00
Awni Hannun	f1606486d2	Generalize gpu backend (#2138 ) * generalize gpu backend * fix no_gpu build * fix no_gpu build * generalize gpu backend	2025-04-30 09:08:17 -07:00
Alex Chi Z.	b36dd472bb	return library if it is successfully loaded (#2131 )	2025-04-29 07:30:36 -07:00
hdeng-apple	167b759a38	Fix typos (#2136 )	2025-04-29 07:26:05 -07:00
Angelos Katharopoulos	f0e70afff0	Fix swift pm load (#2117 )	2025-04-24 10:58:29 -07:00
hdeng-apple	86984cad68	Remove static initializers (#2059 ) * Remove static initializers in device.cpp, load.cpp, pocketfft.h * Remove static initializer InTracing::trace_stack * Remove static initializer of CompilerCache cache * Revert changes in pocketfft.h * Remove duplicate private section of thread_pool()	2025-04-24 06:14:49 -07:00
hdeng-apple	38c1e720c2	Search mlx.metallib in macOS framework "Resources" dir (#2061 ) --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2025-04-23 09:53:13 -07:00
Yury Popov	1d2c9d6a07	Complex scan (#2094 )	2025-04-22 18:56:28 -07:00
Awni Hannun	fdadc4f22c	Add more complex unary ops (#2101 )	2025-04-21 13:04:54 -07:00
Angelos Katharopoulos	3cde719eb7	Route to gather qmm only for many tokens per expert (#2082 )	2025-04-17 14:53:08 -07:00
Angelos Katharopoulos	5de6d94a90	Gather qmm batched kernel and refactoring of quantized (#2078 )	2025-04-17 13:53:11 -07:00
Angelos Katharopoulos	99eefd2ec0	Gather mm new kernel and small refactoring (#2040 )	2025-04-14 16:37:36 -07:00
Yury Popov	e9e268336b	LogCumSumExp (#2069 )	2025-04-13 01:27:29 -07:00
Angelos Katharopoulos	c4189a38e4	Add float mask to sdpa vector (#2068 )	2025-04-11 17:29:40 -07:00
Awni Hannun	ef7ece9851	fix fft bug (#2062 )	2025-04-10 19:41:27 -07:00
Angelos Katharopoulos	ddaa4b7dcb	Fix the test and add custom min/max reductions for uncommon MPI types (#2060 )	2025-04-10 17:01:17 -07:00
Cheng	dfae2c6989	Fix MSVC build due to use of M_LN2 (#2058 )	2025-04-10 07:41:41 -07:00
Anastasiia Filippova	515f104926	Min / max reductions (#2041 )	2025-04-09 23:22:20 -07:00
Angelos Katharopoulos	9ecefd56db	Do not load the default lib if another is requested (#2055 )	2025-04-09 13:31:38 -07:00
Awni Hannun	00794c42bc	Fix causal mask sdpa vec (#2053 ) * fix sdpa vector causal mask * test	2025-04-08 09:11:23 -07:00
Cheng	08a1bf3f10	Remove Event::Signal() (#2052 )	2025-04-08 06:20:27 -07:00
Awni Hannun	60c4154346	Only request residency once (#2051 )	2025-04-07 10:47:51 -07:00
Awni Hannun	f2c85308c1	add a half simd gemm fallback (#2046 ) * add a half simd gemm fallback * nit	2025-04-07 09:31:29 -07:00
Awni Hannun	1a28b69ee2	only add to residency set once (#2049 )	2025-04-06 17:38:25 -07:00
Jagrit Digani	8777fd104f	Depthwise Conv2D optimization (#2036 ) - Add new specialized kernel for small kernel (kernels size <= 7), small strides (strides <= 2) depthwise 2d convolutions - Add related tests	2025-04-03 09:42:04 -07:00
Awni Hannun	c41f7565ed	fix softmax / logsumexp (#2042 )	2025-04-03 08:32:59 -07:00
Awni Hannun	9ba81e3da4	tune quant dispatch (#2031 )	2025-04-02 20:05:54 -07:00
Awni Hannun	c23888acd7	Fix build warning (#2033 )	2025-04-01 14:42:27 -07:00
Awni Hannun	f98ce25ab9	fix residency set for real (#2032 )	2025-04-01 12:59:48 -07:00
Awni Hannun	de5f38fd48	Custom logsumexp (#2028 ) * initial custom logsumexp * more tests * comments + fix	2025-03-31 07:36:55 -07:00
Angelos Katharopoulos	ec2854b13a	Swap -inf for finite_minimum value (#2029 )	2025-03-30 21:55:04 -07:00
Jesper Stemann Andersen	5f5770e3a2	Fix CPU sign for unsigned ints (#2024 ) Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2025-03-30 17:56:59 -07:00
Awni Hannun	28f39e9038	Log for complex numbers in Metal (#2025 ) * Log for complex numbers in Metal * fix log2	2025-03-30 17:04:38 -07:00
Awni Hannun	b2d2b37888	fix residency set clearing (#2027 )	2025-03-30 16:27:26 -07:00
Awni Hannun	13b26775f1	use minimum deployment target (#2016 )	2025-03-28 14:31:53 -07:00
Awni Hannun	05d7118561	causal vector sdpa (#2018 ) * causal vector sdpa * get rid of memory threshold	2025-03-28 12:36:13 -07:00
Awni Hannun	98b901ad66	enable complex gemm (#2017 )	2025-03-28 10:45:13 -07:00
Awni Hannun	bc62932984	sdpa specialization for head dim 256 (#2007 )	2025-03-27 19:31:25 -07:00
Awni Hannun	916fd273ea	wire cache (#2006 )	2025-03-25 18:54:01 -07:00
Jagrit Digani	6a40e1c176	Fix looping limit in causal attention (#1999 )	2025-03-24 12:28:00 -07:00
Andrey Velichkevich	f018e248cd	fix(backend): Include algorithm library in Allocator (#1992 ) Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>	2025-03-22 21:27:51 -07:00
Angelos Katharopoulos	4eef8102c9	Distributed layers (#1270 )	2025-03-21 13:52:17 -07:00
Awni Hannun	2a980a76ce	Add stats and limit to common allocator and enable tests (#1988 ) * add stats to common allocator and enable tests * linux memory and default * fix	2025-03-21 12:28:36 -07:00
Awni Hannun	4e1994e9d7	move memory APIs into top level mlx.core (#1982 )	2025-03-21 07:25:12 -07:00
Awni Hannun	7b7e2352cd	fix malloc or wait deadlock (#1976 )	2025-03-20 16:48:43 -07:00
Awni Hannun	005e7efa64	fix mask in sdpa (#1980 ) * fix mask in sdpa * fix attention mask * Re-enable routing for array mask --------- Co-authored-by: Jagrit Digani <digani@apple.com>	2025-03-20 14:53:12 -07:00
Jagrit Digani	9adcd1a650	Support fused masking in Attention (#1924 ) * Update API to allow mask='causal' in fast::sdpa * Add fallback * Update steel::AttnParams * Fix typo * WIP, basic causal * Update tests * Update benchmarking * Update masking loop limits * Add bool masking and update tests * Update additive mask * Update benchmarks * Update benchmarks * Update tests * Update for bfloat error * Update early exit * Add random seed to tests	2025-03-20 11:01:32 -07:00
Awni Hannun	3c164fca8c	Fix multistream GPU deadlock (#1969 ) * fix multistream GPU deadlock * comments	2025-03-20 07:19:47 -07:00
Awni Hannun	f90206ad74	Guard nullptr dereference (#1972 ) * guard nullptr dereference * comment	2025-03-19 16:24:10 -07:00
Awni Hannun	c6ea2ba329	Use same accumulation precision in gemv as gemm (#1962 ) * use same accumulation precision in gemv as gemm * faster * fix compile	2025-03-16 07:13:24 -07:00
Awni Hannun	736a340478	reduce binary size (#1952 )	2025-03-11 06:30:44 -07:00
Awni Hannun	117e1355a2	fix copy for large arrays (#1953 )	2025-03-10 15:04:25 -07:00
Awni Hannun	3c3e558c60	Support transposed head/seq for kv (#1950 ) * support transposed head/seq for kv * fix flaky test * nit	2025-03-10 10:53:45 -07:00
Awni Hannun	c4230747a1	redesign for faster cpu/gpu synch (#1869 ) * redesign for faster cpu/gpu synch * load + more async CPU * use command encoder API and move more ops to use it * make fence back-end generic + CPU only fence * faster build * fix async eval * fixes + handle temporaries * fix / improve cpu conv * remove unused status, fix siblings * fix extensions * fix * fix no cpu build * format * comments * fix perf regression, remove unecessary abort * fix events, task limit cpu * fix waiting * fix donation / temporaries in normalization	2025-03-06 19:23:38 -08:00
Alex Barron	fd0d63ba5b	Affine quant always in fp32 (#1925 ) * do affine quant in fp32 * static cast	2025-03-04 17:50:19 -08:00
Abe Leininger	3835a428c5	Adds nuclear norm support (#1894 ) * adjust norm unit test tolerance	2025-03-04 13:26:02 -08:00
Awni Hannun	e613d0eaf0	SDPA support for small batch (over sequence) queries (#1922 ) * batch query sdpa * batch sdpa for query	2025-03-04 10:59:04 -08:00
Awni Hannun	6bcd6bcf70	fix donation in scan (#1917 )	2025-03-03 11:30:59 -08:00
Awni Hannun	ba12e4999a	Use a heap for small sizes (#1911 ) * use a heap for small sizes * check if VM	2025-03-03 06:50:57 -08:00
Awni Hannun	4e7cd31d12	Fix slice data size (#1913 ) * fix slice data size * add test	2025-03-02 21:50:42 -08:00
Angelos Katharopoulos	5e6c130d93	RMS norm without scaling (#1915 )	2025-02-28 20:26:57 -08:00
Jagrit Digani	89d327075f	Enabling fused attention for head dim 128 (#1899 ) * Share KV smem * Fix bfloat error * Unroll O = S @ V loop * Perf upgrade * Remove commented out function * Add -Wno-c++17-extensions flag to metal flags * Add -Wno-c++17-extensions flag to metal extension flags	2025-02-26 10:02:06 -08:00
Awni Hannun	7d042f17fe	Double for lapack (#1904 ) * double for lapack ops * add double support for lapack ops	2025-02-25 11:39:36 -08:00
Awni Hannun	7face5d9fd	fix cpu compile (#1897 )	2025-02-24 14:10:30 -08:00
Awni Hannun	a44dc4bdb0	fix leaking objc (#1898 )	2025-02-24 13:57:59 -08:00
Awni Hannun	2d0f384b6f	fix simd erf_inv (#1896 )	2025-02-24 13:57:47 -08:00
Awni Hannun	8ff84b5c43	fix version and expose command queue getter (#1892 )	2025-02-20 15:25:15 -08:00
Jesper Stemann Andersen	0ebc8a3d25	Fixed issue where Clang on FreeBSD failed to compile mlx/backend/cpu/quantized.cpp (#1890 )	2025-02-20 12:02:12 -08:00
Awni Hannun	bbda0fdbdb	Allow non-square lu (#1889 )	2025-02-20 08:13:23 -08:00
Abe Leininger	344a29506e	Enforce triangular matrix form in `tri_inv` (#1876 ) * fix tri_inv bug * Revert "fix tri_inv bug" This reverts commit `b74b290201`. * Make sure that tri_inv returns a triangular matrix --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2025-02-19 12:42:33 -08:00
Angelos Katharopoulos	71de73a668	Fix convs by reverting #1803 (#1882 )	2025-02-18 14:36:34 -08:00
Awni Hannun	5274c3c43f	compiler warnings are errors (#1870 )	2025-02-17 00:07:49 -08:00
Angelos Katharopoulos	1762793989	Remove unused uniform (#1867 )	2025-02-14 15:51:41 -08:00
Jagrit Digani	2dc307f2e6	Winograd Update for Small batches (#1803 ) * Build in padding to Winograd kernels * Add new fused Winograd kernel * Enable weight flipping in Winograd kernels	2025-02-14 13:08:13 -08:00
Awni Hannun	7aea5b1895	Allow dynamic ops per buffer based on dispatches and memory (#1864 ) * Allow dynamic ops per buffer based on dispatches and memory * add initial arch values	2025-02-13 19:18:22 -08:00
Awni Hannun	428f589364	Revert "More buffer donation in some cases (#1858 )" (#1863 ) This reverts commit `d274ae77f2`.	2025-02-13 14:21:44 -08:00
Alex Barron	5cd97f7ffe	Bitwise Inverse (#1862 ) * add bitwise inverse * add vmap + fix nojit * inverse -> invert * add to compile + remove unused	2025-02-13 08:44:14 -08:00
Awni Hannun	e425dc00c0	Faster small batch qmv (#1861 ) * faster small batch qmv * swap batch and block dims for qvm and qmv regular	2025-02-12 22:02:36 -08:00
Awni Hannun	d274ae77f2	More buffer donation in some cases (#1858 ) * more donation * fix * add test	2025-02-12 19:41:37 -08:00
Angelos Katharopoulos	0145911bea	Fixes output donation for IO ops on the GPU (#1857 )	2025-02-12 10:52:30 -08:00
Cheng	142b77751d	Fix compilation error on Windows (#1844 )	2025-02-10 19:53:05 -08:00
Abe Leininger	a5ededf1c3	CPU LU factorization and linear solvers (#1451 ) * linalg solve backend * nits * more nits + fix * luf primitive and lu, solve, and solve_triangular backends * changes / nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-02-10 12:32:24 -08:00
Awni Hannun	1c0c118f7c	Fp64 on the CPU (#1843 ) * add fp64 data type * clean build * update docs * fix bug	2025-02-07 15:52:22 -08:00
Jagrit Digani	b6c6552d20	Add missing #pragma once (#1838 )	2025-02-06 11:11:22 -08:00
Awni Hannun	af1b725fda	Fix a couple of slicing bugs (#1827 ) * fix a few bugs * fix conv grad * speedup test * comment	2025-02-05 19:50:08 -08:00
Awni Hannun	9174606d4c	fix sort (#1835 )	2025-02-05 17:16:27 -08:00
Awni Hannun	fe5987b81d	faster sort (#1831 )	2025-02-05 06:10:22 -08:00
Awni Hannun	a229c8cef0	don't duplicate malloc with custom kernel init (#1830 )	2025-02-04 13:20:57 -08:00
Awni Hannun	1156c84e86	Refactor common into cpu specific and truly common (#1817 ) * refactor * fix extension example * fix no-cpu	2025-02-03 15:58:02 -08:00
Jesper Stemann Andersen	2d8e667400	MinGW support (#1806 ) * Changed /bin/bash to bash for generating compiling preamble * Fix wrt jit_compiler mingw like msvc wrt. WEXITSTATUS * Solved ambiguity wrt. bernoulli test shape * Disabled distributed/ring on Windows * Fixed jit_compiler command wrt. MinGW * Extended jit_compiler patch wrt. WEXITSTATUS to FreeBSD	2025-02-01 12:40:06 -08:00
Awni Hannun	80c863b972	Remove accelerate/ (#1816 ) * remove accelerate * comments * neon reduction	2025-02-01 07:18:26 -08:00
Angelos Katharopoulos	f5cc1eea72	Allow different value dimensions in sdpa_vector (#1811 )	2025-01-31 20:58:59 -08:00
Awni Hannun	b7c9f1d38f	scatter axis + gather axis primitives (#1813 ) * scatter axis + gather axis primitives * add transforms * comment	2025-01-31 20:48:08 -08:00
Awni Hannun	c6fc07f1f4	Unify CPU matmuls, remove unused accelerate conv (#1814 ) * unify matmuls * Update mlx/backend/common/matmul.cpp Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com> --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2025-01-31 14:43:37 -08:00
Awni Hannun	4758c8baa1	Start to cleanup/unify accelerate and common back-ends (Part 1/N) (#1777 ) * start to cleanup/unify accelerate and common back-ends * more progress * simplify * add half type and allow infs in simd exp * unify softmax + quantized, more dispatches to simd quantized mm * add sin/cos, use simd in vector-scalar ops * faster CPU vectorize quant * faster erf/erfinv	2025-01-29 14:34:49 -08:00
Awni Hannun	e6a7ab9675	non square qr (#1783 )	2025-01-21 14:07:47 -08:00
Angelos Katharopoulos	1f4c127fb9	Move some kernels to `get_template_definition` (#1782 )	2025-01-21 08:59:44 -08:00
Awni Hannun	a4667da1eb	Faster synchronization `Fence` primitive (#1773 ) * try faster synchronization move event fixes update bench fix fix * non-functioning kernel * try alternative fence * cleanup barrier * get rid of event_fence * update benchmarks * doc string in metal fence	2025-01-17 18:42:19 -08:00
Awni Hannun	f288db8d34	Fix synchronization bug for in stream async works (#1768 )	2025-01-15 06:07:34 -08:00
Awni Hannun	252e423e81	fix and cleanup event signal/wait for metal (#1765 )	2025-01-10 18:37:26 -08:00
Alex Barron	c7b0300af5	Fix batched qmv bug (#1758 )	2025-01-09 11:45:57 -08:00
Awni Hannun	da8c885784	Simplify removes no-ops from the tape (#1759 ) * simplify removes no-ops from the tape * comment	2025-01-09 11:23:19 -08:00
Awni Hannun	1ccaf80575	Dynamic broadcasting for shapeless compile/export (#1722 ) * working towards dynamic broadcast * shapeless broadcast * fix build + nits * use broadcast arrays in quantize matmul * some cleanup / consistency * mend * some comments * add vjp, jvp for broadcast axes	2025-01-09 11:04:24 -08:00
Cheng	ec36bfa317	Include command stdout in error message (#1756 ) * Include command stdout in error message * On Windows pclose returns the exit code	2025-01-08 07:17:03 -08:00
Cheng	b8f76f717a	Print exceptions in eval_cpu/eval_gpu and abort (#1754 )	2025-01-08 06:31:09 -08:00
Awni Hannun	d1766f2c70	Add boolean mask support in vector SDPA (#1757 )	2025-01-07 20:24:53 -08:00
Awni Hannun	516ded618b	Dynamic slicing (#1741 ) * dynamic slice and slice update * python bindings + tests + fix set item * fix compile issue * comment * fix jit	2025-01-07 14:02:16 -08:00
Awni Hannun	d5ec172c95	Allow boolean mask in sdpa (#1753 ) * allow boolean mask in sdpa * more permissive donation in ternary	2025-01-06 16:57:07 -08:00
Awni Hannun	058d6ce683	mpi send use input as output (#1750 ) * mpi send use input as output * move earlier	2025-01-06 06:08:43 -08:00
Awni Hannun	259025100e	Fix nd ternary on GPU (#1746 )	2025-01-03 11:52:17 -08:00
Awni Hannun	6fa0501387	Fix concatenate/slice_update vjp + reduce binary size (#1735 ) * fix concatenate vjp + reduce binary size * also cast in slice update	2025-01-02 16:36:33 -08:00
Cheng	935c8c4bb1	Make mx.compile work on Windows (#1697 ) * Invoke MSVC on Windows in mx.compile * Export kernel symbol on MSVC * Remove unused template * Parse env pairs in a robust way * No need of cassert * Remove unnecessary helpers * Fix right trim * Move command building to a separate file * Missing header * Do not pollute cwd with cl.exe * Simplify str concat * Pass output dir * Fix styling	2024-12-24 07:02:33 -08:00
Valentin Roussellet	88f993da38	Explicit parentheses around some logical operators (#1732 ) * fix some warnings * format	2024-12-24 07:02:20 -08:00
Awni Hannun	ebfe64b92d	shapeless slice update and broadcast when possible (#1727 )	2024-12-23 11:25:15 -08:00
Awni Hannun	0308e9af71	Allow offset to be an mx.array for `mx.fast.rope` (#1724 ) * allow offset for rope * comment	2024-12-19 15:51:44 -08:00
Awni Hannun	e03f0372b1	More shape type (#1705 ) * more shape type * fix	2024-12-19 08:08:20 -08:00
Awni Hannun	7480059306	track resource limit and throw if exceeded (#1718 )	2024-12-18 18:45:58 -08:00
Cheng	070bd433ab	Shorter kernel name for Windows (#1701 ) * Shorter kernel name for Windows * Only hash the clipped part	2024-12-17 18:51:38 -08:00

1 2 3 4 5 ...

571 Commits