zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-07-23 18:11:17 +08:00

Author	SHA1	Message	Date
Awni Hannun	bc62932984	sdpa specialization for head dim 256 (#2007 )	2025-03-27 19:31:25 -07:00
Awni Hannun	916fd273ea	wire cache (#2006 )	2025-03-25 18:54:01 -07:00
Jagrit Digani	6a40e1c176	Fix looping limit in causal attention (#1999 )	2025-03-24 12:28:00 -07:00
Andrey Velichkevich	f018e248cd	fix(backend): Include algorithm library in Allocator (#1992 ) Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>	2025-03-22 21:27:51 -07:00
Angelos Katharopoulos	4eef8102c9	Distributed layers (#1270 )	2025-03-21 13:52:17 -07:00
Awni Hannun	2a980a76ce	Add stats and limit to common allocator and enable tests (#1988 ) * add stats to common allocator and enable tests * linux memory and default * fix	2025-03-21 12:28:36 -07:00
Awni Hannun	4e1994e9d7	move memory APIs into top level mlx.core (#1982 )	2025-03-21 07:25:12 -07:00
Awni Hannun	7b7e2352cd	fix malloc or wait deadlock (#1976 )	2025-03-20 16:48:43 -07:00
Awni Hannun	005e7efa64	fix mask in sdpa (#1980 ) * fix mask in sdpa * fix attention mask * Re-enable routing for array mask --------- Co-authored-by: Jagrit Digani <digani@apple.com>	2025-03-20 14:53:12 -07:00
Jagrit Digani	9adcd1a650	Support fused masking in Attention (#1924 ) * Update API to allow mask='causal' in fast::sdpa * Add fallback * Update steel::AttnParams * Fix typo * WIP, basic causal * Update tests * Update benchmarking * Update masking loop limits * Add bool masking and update tests * Update additive mask * Update benchmarks * Update benchmarks * Update tests * Update for bfloat error * Update early exit * Add random seed to tests	2025-03-20 11:01:32 -07:00
Awni Hannun	3c164fca8c	Fix multistream GPU deadlock (#1969 ) * fix multistream GPU deadlock * comments	2025-03-20 07:19:47 -07:00
Awni Hannun	f90206ad74	Guard nullptr dereference (#1972 ) * guard nullptr dereference * comment	2025-03-19 16:24:10 -07:00
Awni Hannun	c6ea2ba329	Use same accumulation precision in gemv as gemm (#1962 ) * use same accumulation precision in gemv as gemm * faster * fix compile	2025-03-16 07:13:24 -07:00
Awni Hannun	736a340478	reduce binary size (#1952 )	2025-03-11 06:30:44 -07:00
Awni Hannun	117e1355a2	fix copy for large arrays (#1953 )	2025-03-10 15:04:25 -07:00
Awni Hannun	3c3e558c60	Support transposed head/seq for kv (#1950 ) * support transposed head/seq for kv * fix flaky test * nit	2025-03-10 10:53:45 -07:00
Awni Hannun	c4230747a1	redesign for faster cpu/gpu synch (#1869 ) * redesign for faster cpu/gpu synch * load + more async CPU * use command encoder API and move more ops to use it * make fence back-end generic + CPU only fence * faster build * fix async eval * fixes + handle temporaries * fix / improve cpu conv * remove unused status, fix siblings * fix extensions * fix * fix no cpu build * format * comments * fix perf regression, remove unecessary abort * fix events, task limit cpu * fix waiting * fix donation / temporaries in normalization	2025-03-06 19:23:38 -08:00
Alex Barron	fd0d63ba5b	Affine quant always in fp32 (#1925 ) * do affine quant in fp32 * static cast	2025-03-04 17:50:19 -08:00
Abe Leininger	3835a428c5	Adds nuclear norm support (#1894 ) * adjust norm unit test tolerance	2025-03-04 13:26:02 -08:00
Awni Hannun	e613d0eaf0	SDPA support for small batch (over sequence) queries (#1922 ) * batch query sdpa * batch sdpa for query	2025-03-04 10:59:04 -08:00
Awni Hannun	6bcd6bcf70	fix donation in scan (#1917 )	2025-03-03 11:30:59 -08:00
Awni Hannun	ba12e4999a	Use a heap for small sizes (#1911 ) * use a heap for small sizes * check if VM	2025-03-03 06:50:57 -08:00
Awni Hannun	4e7cd31d12	Fix slice data size (#1913 ) * fix slice data size * add test	2025-03-02 21:50:42 -08:00
Angelos Katharopoulos	5e6c130d93	RMS norm without scaling (#1915 )	2025-02-28 20:26:57 -08:00
Jagrit Digani	89d327075f	Enabling fused attention for head dim 128 (#1899 ) * Share KV smem * Fix bfloat error * Unroll O = S @ V loop * Perf upgrade * Remove commented out function * Add -Wno-c++17-extensions flag to metal flags * Add -Wno-c++17-extensions flag to metal extension flags	2025-02-26 10:02:06 -08:00
Awni Hannun	7d042f17fe	Double for lapack (#1904 ) * double for lapack ops * add double support for lapack ops	2025-02-25 11:39:36 -08:00
Awni Hannun	7face5d9fd	fix cpu compile (#1897 )	2025-02-24 14:10:30 -08:00
Awni Hannun	a44dc4bdb0	fix leaking objc (#1898 )	2025-02-24 13:57:59 -08:00
Awni Hannun	2d0f384b6f	fix simd erf_inv (#1896 )	2025-02-24 13:57:47 -08:00
Awni Hannun	8ff84b5c43	fix version and expose command queue getter (#1892 )	2025-02-20 15:25:15 -08:00
Jesper Stemann Andersen	0ebc8a3d25	Fixed issue where Clang on FreeBSD failed to compile mlx/backend/cpu/quantized.cpp (#1890 )	2025-02-20 12:02:12 -08:00
Awni Hannun	bbda0fdbdb	Allow non-square lu (#1889 )	2025-02-20 08:13:23 -08:00
Abe Leininger	344a29506e	Enforce triangular matrix form in `tri_inv` (#1876 ) * fix tri_inv bug * Revert "fix tri_inv bug" This reverts commit `b74b290201`. * Make sure that tri_inv returns a triangular matrix --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2025-02-19 12:42:33 -08:00
Angelos Katharopoulos	71de73a668	Fix convs by reverting #1803 (#1882 )	2025-02-18 14:36:34 -08:00
Awni Hannun	5274c3c43f	compiler warnings are errors (#1870 )	2025-02-17 00:07:49 -08:00
Angelos Katharopoulos	1762793989	Remove unused uniform (#1867 )	2025-02-14 15:51:41 -08:00
Jagrit Digani	2dc307f2e6	Winograd Update for Small batches (#1803 ) * Build in padding to Winograd kernels * Add new fused Winograd kernel * Enable weight flipping in Winograd kernels	2025-02-14 13:08:13 -08:00
Awni Hannun	7aea5b1895	Allow dynamic ops per buffer based on dispatches and memory (#1864 ) * Allow dynamic ops per buffer based on dispatches and memory * add initial arch values	2025-02-13 19:18:22 -08:00
Awni Hannun	428f589364	Revert "More buffer donation in some cases (#1858 )" (#1863 ) This reverts commit `d274ae77f2`.	2025-02-13 14:21:44 -08:00
Alex Barron	5cd97f7ffe	Bitwise Inverse (#1862 ) * add bitwise inverse * add vmap + fix nojit * inverse -> invert * add to compile + remove unused	2025-02-13 08:44:14 -08:00
Awni Hannun	e425dc00c0	Faster small batch qmv (#1861 ) * faster small batch qmv * swap batch and block dims for qvm and qmv regular	2025-02-12 22:02:36 -08:00
Awni Hannun	d274ae77f2	More buffer donation in some cases (#1858 ) * more donation * fix * add test	2025-02-12 19:41:37 -08:00
Angelos Katharopoulos	0145911bea	Fixes output donation for IO ops on the GPU (#1857 )	2025-02-12 10:52:30 -08:00
Cheng	142b77751d	Fix compilation error on Windows (#1844 )	2025-02-10 19:53:05 -08:00
Abe Leininger	a5ededf1c3	CPU LU factorization and linear solvers (#1451 ) * linalg solve backend * nits * more nits + fix * luf primitive and lu, solve, and solve_triangular backends * changes / nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-02-10 12:32:24 -08:00
Awni Hannun	1c0c118f7c	Fp64 on the CPU (#1843 ) * add fp64 data type * clean build * update docs * fix bug	2025-02-07 15:52:22 -08:00
Jagrit Digani	b6c6552d20	Add missing #pragma once (#1838 )	2025-02-06 11:11:22 -08:00
Awni Hannun	af1b725fda	Fix a couple of slicing bugs (#1827 ) * fix a few bugs * fix conv grad * speedup test * comment	2025-02-05 19:50:08 -08:00
Awni Hannun	9174606d4c	fix sort (#1835 )	2025-02-05 17:16:27 -08:00
Awni Hannun	fe5987b81d	faster sort (#1831 )	2025-02-05 06:10:22 -08:00
Awni Hannun	a229c8cef0	don't duplicate malloc with custom kernel init (#1830 )	2025-02-04 13:20:57 -08:00
Awni Hannun	1156c84e86	Refactor common into cpu specific and truly common (#1817 ) * refactor * fix extension example * fix no-cpu	2025-02-03 15:58:02 -08:00
Jesper Stemann Andersen	2d8e667400	MinGW support (#1806 ) * Changed /bin/bash to bash for generating compiling preamble * Fix wrt jit_compiler mingw like msvc wrt. WEXITSTATUS * Solved ambiguity wrt. bernoulli test shape * Disabled distributed/ring on Windows * Fixed jit_compiler command wrt. MinGW * Extended jit_compiler patch wrt. WEXITSTATUS to FreeBSD	2025-02-01 12:40:06 -08:00
Awni Hannun	80c863b972	Remove accelerate/ (#1816 ) * remove accelerate * comments * neon reduction	2025-02-01 07:18:26 -08:00
Angelos Katharopoulos	f5cc1eea72	Allow different value dimensions in sdpa_vector (#1811 )	2025-01-31 20:58:59 -08:00
Awni Hannun	b7c9f1d38f	scatter axis + gather axis primitives (#1813 ) * scatter axis + gather axis primitives * add transforms * comment	2025-01-31 20:48:08 -08:00
Awni Hannun	c6fc07f1f4	Unify CPU matmuls, remove unused accelerate conv (#1814 ) * unify matmuls * Update mlx/backend/common/matmul.cpp Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com> --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2025-01-31 14:43:37 -08:00
Awni Hannun	4758c8baa1	Start to cleanup/unify accelerate and common back-ends (Part 1/N) (#1777 ) * start to cleanup/unify accelerate and common back-ends * more progress * simplify * add half type and allow infs in simd exp * unify softmax + quantized, more dispatches to simd quantized mm * add sin/cos, use simd in vector-scalar ops * faster CPU vectorize quant * faster erf/erfinv	2025-01-29 14:34:49 -08:00
Awni Hannun	e6a7ab9675	non square qr (#1783 )	2025-01-21 14:07:47 -08:00
Angelos Katharopoulos	1f4c127fb9	Move some kernels to `get_template_definition` (#1782 )	2025-01-21 08:59:44 -08:00
Awni Hannun	a4667da1eb	Faster synchronization `Fence` primitive (#1773 ) * try faster synchronization move event fixes update bench fix fix * non-functioning kernel * try alternative fence * cleanup barrier * get rid of event_fence * update benchmarks * doc string in metal fence	2025-01-17 18:42:19 -08:00
Awni Hannun	f288db8d34	Fix synchronization bug for in stream async works (#1768 )	2025-01-15 06:07:34 -08:00
Awni Hannun	252e423e81	fix and cleanup event signal/wait for metal (#1765 )	2025-01-10 18:37:26 -08:00
Alex Barron	c7b0300af5	Fix batched qmv bug (#1758 )	2025-01-09 11:45:57 -08:00
Awni Hannun	da8c885784	Simplify removes no-ops from the tape (#1759 ) * simplify removes no-ops from the tape * comment	2025-01-09 11:23:19 -08:00
Awni Hannun	1ccaf80575	Dynamic broadcasting for shapeless compile/export (#1722 ) * working towards dynamic broadcast * shapeless broadcast * fix build + nits * use broadcast arrays in quantize matmul * some cleanup / consistency * mend * some comments * add vjp, jvp for broadcast axes	2025-01-09 11:04:24 -08:00
Cheng	ec36bfa317	Include command stdout in error message (#1756 ) * Include command stdout in error message * On Windows pclose returns the exit code	2025-01-08 07:17:03 -08:00
Cheng	b8f76f717a	Print exceptions in eval_cpu/eval_gpu and abort (#1754 )	2025-01-08 06:31:09 -08:00
Awni Hannun	d1766f2c70	Add boolean mask support in vector SDPA (#1757 )	2025-01-07 20:24:53 -08:00
Awni Hannun	516ded618b	Dynamic slicing (#1741 ) * dynamic slice and slice update * python bindings + tests + fix set item * fix compile issue * comment * fix jit	2025-01-07 14:02:16 -08:00
Awni Hannun	d5ec172c95	Allow boolean mask in sdpa (#1753 ) * allow boolean mask in sdpa * more permissive donation in ternary	2025-01-06 16:57:07 -08:00
Awni Hannun	058d6ce683	mpi send use input as output (#1750 ) * mpi send use input as output * move earlier	2025-01-06 06:08:43 -08:00
Awni Hannun	259025100e	Fix nd ternary on GPU (#1746 )	2025-01-03 11:52:17 -08:00
Awni Hannun	6fa0501387	Fix concatenate/slice_update vjp + reduce binary size (#1735 ) * fix concatenate vjp + reduce binary size * also cast in slice update	2025-01-02 16:36:33 -08:00
Cheng	935c8c4bb1	Make mx.compile work on Windows (#1697 ) * Invoke MSVC on Windows in mx.compile * Export kernel symbol on MSVC * Remove unused template * Parse env pairs in a robust way * No need of cassert * Remove unnecessary helpers * Fix right trim * Move command building to a separate file * Missing header * Do not pollute cwd with cl.exe * Simplify str concat * Pass output dir * Fix styling	2024-12-24 07:02:33 -08:00
Valentin Roussellet	88f993da38	Explicit parentheses around some logical operators (#1732 ) * fix some warnings * format	2024-12-24 07:02:20 -08:00
Awni Hannun	ebfe64b92d	shapeless slice update and broadcast when possible (#1727 )	2024-12-23 11:25:15 -08:00
Awni Hannun	0308e9af71	Allow offset to be an mx.array for `mx.fast.rope` (#1724 ) * allow offset for rope * comment	2024-12-19 15:51:44 -08:00
Awni Hannun	e03f0372b1	More shape type (#1705 ) * more shape type * fix	2024-12-19 08:08:20 -08:00
Awni Hannun	7480059306	track resource limit and throw if exceeded (#1718 )	2024-12-18 18:45:58 -08:00
Cheng	070bd433ab	Shorter kernel name for Windows (#1701 ) * Shorter kernel name for Windows * Only hash the clipped part	2024-12-17 18:51:38 -08:00
Awni Hannun	9111999af3	Fix small sort with metal validation (#1695 )	2024-12-12 09:21:45 -08:00
Awni Hannun	6bd28d246e	Allow no copy negative strides in as_strided and slice (#1688 ) * allow no copy negative strides in as_strided and slice * fix jit * fix jit	2024-12-12 08:59:45 -08:00
Cheng	4d595a2a39	Make compiled preamble work in MSVC (#1675 ) * Make compiled preamble work in MSVC * Remove logging * Only use powershell for MSVC	2024-12-12 08:55:49 -08:00
Awni Hannun	4e1e9520e1	Flatten and unflatten (#1692 ) * flatten and unflatten * fix grad * fix shape infer * use squeeze + unsqueeze in get_item	2024-12-11 21:51:37 -08:00
Awni Hannun	f3dfa36a3a	Fix x86 tests (#1691 ) * fix x86 tests * comment	2024-12-11 07:47:18 -08:00
Awni Hannun	f76a49e555	`ExpandDims` primitive (#1687 ) * add squeeze primitive * simplify squeeze, use in gather * fix * fix * fix * fix * fix no cpu * use squeeze in matmul and friends * expand dims primitive * comment	2024-12-10 16:39:07 -08:00
Awni Hannun	40c62c1321	Use int64 stride everywhere (#1671 ) * use int64 stride everywhere * fix ext * fix ext * more shape + cleanup * one more * few more	2024-12-09 11:09:02 -08:00
Cheng	6ae5423b4a	Do not pass integers to isnan (#1664 )	2024-12-07 18:26:23 -08:00
Cheng	3ceb341a75	Use correct complex type for MSVC (#1660 )	2024-12-07 18:25:22 -08:00
Alex Barron	95c4a2e3af	add back conditionaltype (#1655 )	2024-12-06 11:12:01 -08:00
Jagrit Digani	9d40e521d7	Stop matrix copies with new attention kernel (#1639 )	2024-12-02 14:12:38 -08:00
Jesper Stemann Andersen	e4eeb4e910	Added missing unordered_map includes (#1635 ) * Added missing includes in mlx/io.h and mlx/backend/metal/metal.h * Added additional missing unordered_map includes that fixes build on FreeBSD	2024-12-02 07:03:03 -08:00
Ikko Eltociear Ashimine	9bc2183a31	docs: update device.cpp (#1632 ) unecessary -> unnecessary	2024-11-27 20:58:26 -08:00
Awni Hannun	d4b222b6d3	Fix some leaks and races (#1629 ) * fix leak and fix potential race * more leak fixes * fix one more	2024-11-27 20:01:20 -08:00
Jesper Stemann Andersen	af2af818a6	Enables build for -linux-musl (#1627 ) Also contributes to being able to build for -w64-mingw32. Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761	2024-11-27 13:14:24 -08:00
Awni Hannun	211411faf2	fix large ops (#1620 )	2024-11-24 09:17:10 -08:00
Alex Barron	6f7986d592	Cleaner `qmv`/`qvm` (#1616 )	2024-11-22 11:14:08 -08:00
Jagrit Digani	02bec0bb6d	Matrix Attention kernel (#1610 ) * Rough INIT * [WIP]: Loading and Matmuls added * [WIP]: Reductions and min working aligned kernel at headdim = 64 * [WIP] Added headdim 80 for testing * [WIP] Update dispatch params for testing * [WIP] Add support for unaligned seq lengths - still looks messy * Update sdpa_benchmarks * Update sdpa_benchmarks * Update sdpa_benchmarks * Enable gqa support * Update benchmark and switch off 128 headdim * Update headdim 128 tuning * Remove older fast attention code. Write out O strided * Disable hd=128 until further optimizations * Enable bf16 * Fix data size bug * Enable attn build outside of jit	2024-11-22 10:34:05 -08:00
Alex Barron	c79f6a4a8c	3 and 6 bit quantization (#1613 ) * Support 3 and 6 bit quantization	2024-11-22 10:22:13 -08:00

1 2 3 4 5 ...

452 Commits