zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-10-22 19:28:14 +08:00

Author	SHA1	Message	Date
Awni Hannun	ea9090bbc4	Add view op (#1179 ) * add view primitive * nit * fix view	2024-06-04 08:05:27 -07:00
Alex Barron	4d485fca24	Add defines include (#1176 ) Co-authored-by: Alex Barron <abarron22@apple.com>	2024-06-03 09:50:10 -07:00
Brian Keene	1865299a30	Metal shaders for memory efficient self attention on large sequences (#964 ) * Metal shaders for efficient self attention on large sequences Updated fast attention: GEMM-ified with Steel primitives Uses flash attention 1 for scale correction * more compiler silencing * Address rebase issues * Templatize kernel instantiation, revise cpu bindings * Safer writes to output * Permit batch size > 1 * Numerical fixes for sdpa self attention * Re-enable test, remove unused variable * add benchmarking script * Disable sdpa prior to perf tuning, and simplify tests for per-patch CI	2024-06-03 09:16:19 -07:00
Jagrit Digani	76b6cece46	Fix multi-block sort stride management (#1169 ) * Fix multi-block sort stride management * Add seed to tests	2024-05-31 11:10:54 -07:00
Jagrit Digani	9f0df51f8d	Fix matvec vector stride bug (#1168 )	2024-05-29 12:18:28 -07:00
Awni Hannun	e7a2a3dcd1	Fix a couple bugs (#1161 ) * fix jit reduce for RMS norm * make strides a single buffer * better eval error message * fix compiling with inf and bf16 * fix cpu compile with bf16	2024-05-28 15:18:18 -07:00
Awni Hannun	7e26fd8032	Option to JIT steel gemm / conv (#1139 )	2024-05-23 18:07:34 -07:00
Jagrit Digani	eab2685c67	Float mask update (#1152 ) * Float mask update * Update CPU impl	2024-05-23 17:20:44 -07:00
Awni Hannun	0189ab6ab6	More jitting (#1132 ) * docs + circle min size build * jit scan, arange, softmax * add sort * jit reductions * remove print * fix deps * clean includes / nits	2024-05-23 16:23:44 -07:00
Rifur13	9401507336	Add groups to 2-D convolutions (#1129 ) * Added groups to 2-D convolutions. Only implemented for some specializations. Also fixed 1D grouped convs with different kernel strides and added more tests. * fix channels condition	2024-05-22 20:01:44 -07:00
Awni Hannun	e110ca11e2	Fix offset bug for device buffers (#1151 ) * fix bug with large offsets for buffers * add a test * remove test as its too big for small machine	2024-05-22 15:50:05 -07:00
Awni Hannun	226748b3e7	JIT compile option for binary minimization (#1091 ) * try cpp 20 for compile * unary, binary, ternary in jit * nits * fix gather/scatter * fix rebase * reorg compile * add ternary to compile * jit copy * jit compile flag * fix build * use linked function for ternary * some nits * docs + circle min size build * docs + circle min size build * fix extension * fix no cpu build * improve includes	2024-05-22 12:57:13 -07:00
Awni Hannun	d568c7ee36	Rename block sparse (#1149 ) * block_sparse_mm to gather_mm * rename * nit * nit	2024-05-22 07:48:34 -07:00
Angelos Katharopoulos	da83f899bb	Improve qvm speed (#1140 )	2024-05-20 09:20:44 -07:00
Awni Hannun	23406c9e9e	Choose the right MLX bf16 for extensions (#1135 ) * default to custom bf * choose right bf * fix extensions * fix circle conf	2024-05-17 15:09:28 -07:00
Luca Arnaboldi	b3ec792380	Implemented Cholesky on CPU (#1119 )	2024-05-17 12:31:59 -07:00
Angelos Katharopoulos	e78a6518fa	Block sparse qmm (#1124 )	2024-05-16 15:24:14 -07:00
Awni Hannun	1873ffda01	Detect metal version and propagate correctly for JIT (#1109 ) * detect metal version and propagate correctly for JIT * remove softmax * fix versions	2024-05-15 17:42:09 -07:00
Jagrit Digani	358e1fd6ab	Fused GEMM (#1123 ) * Basic gemm working * Update addmm * Clear out steel_gemm and steel_addmm kernels * Fuse and clear out gather gemm * Update objc releases	2024-05-15 10:30:41 -07:00
Awni Hannun	863039da4c	Allow scatter type exception to be caught by checking in op (#1077 ) * allow exception to be caught in main thread * only for gpu * more detailed scatter error	2024-05-13 17:43:53 -07:00
Max-Heinrich Laves	ff4223904d	Conv3d (#993 ) * added conv3d added conv3d implemented explicit_gemm_conv_ND_cpu and bounds checks for slow_conv_3D * incorporated reviewer comments * fixed test * reduced tensor shapes in test for conv3d * Reviewer suggestion Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Reviewer suggestion Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Reviewer suggestion Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Reviewer suggestion	2024-05-11 06:15:02 -07:00
Alex Barron	2e158cf6d0	Add conjugate operator (#1100 ) * cpu and gpu impl * add mx.conj and array.conj() --------- Co-authored-by: Alex Barron <abarron22@apple.com>	2024-05-10 07:22:20 -07:00
Awni Hannun	06375e6605	Split encoders in non-concurrent context with a max ops per encoder (#1085 ) * split encoders * fix race	2024-05-09 16:21:02 -07:00
Rahul Yedida	cc05a281c4	Added ArcTan2 operation (#1079 ) * Added ArcTan2 operation * Cleanup, bug fixes from code review * Minor cleanup, fixed Linux tests	2024-05-08 08:35:15 -07:00
Jagrit Digani	fe96ceee66	Update block offset adjustment to be in size_t (#1087 )	2024-05-08 08:10:23 -07:00
Awni Hannun	21623156a3	Reset peak memory (#1074 ) * reset peak memory * fix linux * nits in docs	2024-05-03 17:12:51 -07:00
Awni Hannun	b00ac960b4	change initial memory limits and add memory size to device info (#1064 )	2024-05-03 06:50:15 -07:00
Jagrit Digani	f390957685	Block sparse mm (#1058 )	2024-05-02 14:03:58 -07:00
Angelos Katharopoulos	17f57df797	Improvements in the quantizer and dequantization kernel (#1061 )	2024-05-01 18:19:11 -07:00
Awni Hannun	19bef39f5c	Add a `mx.metal.device_info` (#1060 ) * device inof * add variant * fix linux * fix doc	2024-04-30 15:47:27 -07:00
Nripesh Niketan	a30e7ed2da	feat: metal formatting and pre-commit bump (#1038 ) * feat: metal formatting and pre-commit bump * add guards * update * more guards * more guards * smakk fix * Refactor instantiation of ternary types in ternary.metal * fix scan.metal	2024-04-30 07:18:09 -07:00
Awni Hannun	09f1777896	fix slice update indexing (#1053 )	2024-04-29 12:17:40 -07:00
Rifur13	c4a471c99d	Add groups to Conv1d (#948 ) * Add conv1d grouped convs on CPU * Add GPU support * Parallelize inside metal kernel * clenaup * Update mlx/ops.cpp Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * New unfold kernel + remove unused code * Remove copy and refactor * Update vjp and reuse steel gemm * Fixed groups on cpu * Fix metal validation --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-04-27 06:24:57 -07:00
Awni Hannun	86f495985b	Add bitwise ops (#1037 ) * bitwise ops * fix tests	2024-04-26 22:03:42 -07:00
Awni Hannun	5bfe89bdb1	Cpp docs (#1036 ) * start of C++ docs * fix stream doc * only include ops for now	2024-04-26 12:56:05 -07:00
Awni Hannun	771575d27b	Expose function to clear memory cache (#1032 ) * expose function to clear memory cache * fix linux build * fix metal tests	2024-04-24 16:48:51 -07:00
Angelos Katharopoulos	20a01bbd9f	Simplifying and improving qmm (#1030 )	2024-04-24 13:07:45 -07:00
Awni Hannun	3d405fb3b1	Add synchronize function (#1006 ) * add synchronize function * fix linux * fix linux * fix and fix docs * fix test * try synchronize in stream destroy * synchronize works for both cpu and gpu	2024-04-22 08:25:46 -07:00
Jagrit Digani	85c8a91a27	Fix mask broadcasting bug and add relevant test (#1003 )	2024-04-17 17:33:48 -07:00
Awni Hannun	8a0677d56d	Shared events for synchronization + async eval (#998 ) * more async eval * fix rebase * try correct async eval * fix async * more tests for async eval * use shared events for synchronization * comment + cleanup * with autorelease pool * fix no metal build * fix compile * fix patch * don't eval if asyn evale'd * don't use is_evaled * comments * more multi stream tests * try and cleanup use of is_evaled * use a status flag	2024-04-17 06:16:02 -07:00
Jagrit Digani	b18468bf81	Masked mm (#978 ) * Add block masked matmul op and primitive	2024-04-16 14:45:39 -07:00
Alex Barron	2e7c02d5cd	Metal FFT for powers of 2 up to 2048 (#915 ) * add Metal FFT for powers of 2 * skip GPU test on linux * fix contiguity bug * address comments * Update mlx/backend/metal/fft.cpp * Update mlx/backend/metal/fft.cpp * fix bug in synch --------- Co-authored-by: Alex Barron <abarron22@apple.com> Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-04-11 21:40:06 -07:00
Awni Hannun	ae18326533	No copy command encoder (#986 ) * no copy command encoder * up layer norm test tolerances	2024-04-11 21:15:36 -07:00
Nripesh Niketan	ffff671273	Update pre-commit hooks (#984 )	2024-04-11 07:27:53 -07:00
Awni Hannun	12d4507ee3	Explicit barriers with concurrent dispatch (#977 )	2024-04-10 21:45:31 -07:00
Awni Hannun	42afe27e12	std and expm1 (#973 ) * std and expm1 * actually add expm1 * fix linux * fix vjp * relax tol for linux test * Add it to the compilable primitives --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-04-08 14:26:01 -07:00
Awni Hannun	76e63212ff	Enable bfloat scan (#974 ) * enable bfloat scan * fix tests	2024-04-08 12:29:19 -07:00
Awni Hannun	aac2f9fb61	Improve profiling with gpu tracing (#969 ) * improve profiling with gpu tracing * fix for linux * nit * doc fix * fix example	2024-04-07 21:47:43 -07:00
Awni Hannun	d88d2124b5	segfaut layer norm grad (#955 )	2024-04-04 10:59:15 -07:00
Awni Hannun	e142aaf8a1	Option for precise softmax (#953 ) * precise softmax * Add an equivalency check * Make the threadgroup memory definition fixed * precise cpu softmax * precise option on cpu * remove print --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-04-04 08:32:35 -07:00

1 2 3 4 5 ...

258 Commits