zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-12-16 01:49:05 +08:00

Author	SHA1	Message	Date
Awni Hannun	4bce5f9b2d	suppress gcc 10.1 warnings (#2679 ) * suppress gcc 10.1 warnings * suppress gcc 10.1 warnings	2025-10-17 12:09:21 -07:00
Anastasiia Filippova	e9eab527eb	Nccl timeout (#2673 ) * print the error & delete nccl group * timeout for nccl binding * typo * revert error * fixed a typo	2025-10-14 12:29:54 -07:00
Awni Hannun	36ca62dba8	remove unused unary file (#2672 )	2025-10-13 19:36:26 -07:00
Manuel Villanueva	9cbb1b0148	Modified sort behavior when running CPU or Metal to match NumPy/JAX (#2667 ) * Modified sort behavior when running CPU or Metal to match NumPy/JAX sorting behavior. * Modified sort behavior when running CPU or Metal to match NumPy/JAX * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-10-13 14:36:45 -07:00
Awni Hannun	25e2356316	speed up scalars (#2669 )	2025-10-13 12:10:15 -07:00
Awni Hannun	226a1d24e0	Debug cuda conv (#2662 ) * use t4 * use t4	2025-10-10 16:12:47 -07:00
Awni Hannun	630350ad3e	Precise sigmoid (#2659 ) * bump patch * Sigmoid matches PyTorch and is more precise on tails	2025-10-10 10:05:23 -07:00
Awni Hannun	380aeb58ae	enable admm low-precision cpu (#2661 )	2025-10-10 09:50:54 -07:00
Awni Hannun	f37389d100	bump patch (#2658 )	2025-10-10 08:36:41 -07:00
Awni Hannun	e89e8b4272	Export with callback (#2612 ) * export with callback * export with callback * Add types, fix kwarg ordering bug + test * cleanup, test, fix * typos	2025-10-08 19:24:33 -07:00
AN Long	85a8824a8c	Fix cumulative operations when axis=None (#2653 )	2025-10-08 15:25:38 -07:00
Awni Hannun	f5d4397e5c	Fix fast synch when fence is waited before a command buffer is created (#2657 )	2025-10-08 11:23:46 -07:00
Awni Hannun	343e33b6d5	fix all_gather vjp (#2654 )	2025-10-07 06:05:23 -07:00
Angelos Katharopoulos	0073096dd1	Split name into directories for cuda jit (#2656 )	2025-10-07 01:52:58 -07:00
Angelos Katharopoulos	e3d004fed9	Fix and refactor row-reduce (#2650 )	2025-10-07 01:51:08 -07:00
Awni Hannun	a393435d28	Speed up compile for node with many parents (#2649 )	2025-10-03 19:30:36 -07:00
Awni Hannun	a7a94b29d7	Fix compile when outputs change (#2648 )	2025-10-03 08:40:57 -07:00
Daniel Yeh	22a5da76c8	Faster complex matmul (#2571 )	2025-10-02 23:33:15 -07:00
Angelos Katharopoulos	c2c3e0b0a2	[CUDA] Add a small column specialization to reduce (#2642 )	2025-10-02 14:41:05 -07:00
Awni Hannun	b0cc71ae71	Faster triu, tril, where with scalar (#2644 )	2025-10-02 12:21:27 -07:00
Awni Hannun	bbf1423953	wait for tasks in cuda (#2636 )	2025-09-30 16:08:46 -07:00
Angelos Katharopoulos	eb24267b56	Compile now can attach arbitrary data to an entry (#2634 )	2025-09-30 13:33:27 -07:00
Awni Hannun	dc371ae7a5	fix for max block dim (#2631 )	2025-09-29 08:59:25 -07:00
AN Long	e76a8dd5c5	Fix incorrect path and typos (#2630 )	2025-09-28 06:03:04 -07:00
Cheng	b466dea982	[CUDA] Make CudaEvent work with multi-device (#2614 ) * Set current device when creating cuda event * Separate cuda events by device * Avoid race condition in pool	2025-09-27 11:27:17 +09:00
Angelos Katharopoulos	7a6adda1e6	Bump the version (#2627 )	2025-09-26 15:15:28 -07:00
Angelos Katharopoulos	1a9f820af6	Compiled should not end in broadcast (#2622 )	2025-09-26 13:36:09 -07:00
Jagrit Digani	7c7e48dbd1	New tuning for small K gemv (#2620 ) * New tuning for small K gemv	2025-09-23 12:28:35 -07:00
Daniel Yeh	bf01ad9367	fix (#2613 ) Co-authored-by: Chen-Chen Yeh <ge96noj@mytum.de>	2025-09-22 20:12:04 -07:00
Cheng	ae438d05fa	[CUDA] Recycle CUDA events (#2604 ) * Make CudaEvent a CudaHandle * Add caching for CudaEvent * Make sure cuda events are destroyed at last * Fix headers * SharedEvent => AtomicEvent * RawCudaEvent => CudaEventHandle, CudaEventWrapper => CopyableCudaEvent * Remove unneeded asserts	2025-09-23 10:42:03 +09:00
Awni Hannun	711a645807	avoid producing NaN in attention (#2608 )	2025-09-22 13:10:43 -07:00
Josh Bleecher Snyder	aa9d44b3d4	implement Convolution::output_shape (#2601 ) - pull conv_out_shape out for re-use - add Conv::output_shape - add e2e python tests confirming shapeless=True support and correctness Updates #2599	2025-09-22 10:09:45 -07:00
Awni Hannun	ec2ab42888	Lower sorted QMM gather threshold (#2609 )	2025-09-19 18:22:55 -07:00
Cheng	787c0d90cd	Detect cache thrashing in LRUCache (#2600 ) * Detect cache thrashing in LRUCache * Do not check cache thrashing in tests	2025-09-19 09:12:14 +09:00
Oleksandr Bilous	e8b604a6a3	fix: library loading for swift dynamic frameworks (#2568 )	2025-09-18 13:54:59 -07:00
Awni Hannun	caecbe876a	no copy batch rope (#2595 )	2025-09-15 14:23:48 -07:00
Awni Hannun	6ccfa603cd	fix metal scan (#2591 )	2025-09-15 11:01:57 -07:00
Awni Hannun	ee18e1cbf0	patch bump (#2588 )	2025-09-11 17:10:09 -07:00
Awni Hannun	af120c2bc0	set nccl ABI version (#2587 )	2025-09-11 16:55:53 -07:00
Cheng	6a3acf2301	[CUDA] Set bias as input when using bias epilogue (#2584 )	2025-09-11 15:31:09 +09:00
Awni Hannun	d6977f2a57	Add sdpa with sinks (#2558 ) * add sdpa with sinks * fix 2 pass * fix matrix sdpa * fix perf regression * add to cuda (#2580)	2025-09-10 14:53:00 -07:00
Cheng	44cc5da4bc	[CUDA] Fix alpha not respected when using bias epilogue (#2578 )	2025-09-10 09:08:01 +09:00
Cheng	dde3682b69	[CUDA] Use GEMM with epilogue instead of AddMM (#2569 )	2025-09-09 13:18:49 +09:00
Awni Hannun	17310d91a6	Add batch offsets for mx.fast.rope (#2564 ) * implement batch rope for Metal * cuda rope (#2576)	2025-09-08 17:35:07 -07:00
Cheng	a44b27f5f8	Fix a few ccache cache miss (#2573 ) * Fix ccache cache miss * Do not define _VERSION_ in python bindings	2025-09-09 07:41:05 +09:00
Awni Hannun	e5a33f2223	faster depthwise 1D conv (#2567 )	2025-09-08 11:37:23 -07:00
Awni Hannun	b61a65e313	fix copies in sdpa (#2563 )	2025-09-02 11:00:36 -07:00
Awni Hannun	8ce49cd39e	fix quantized vjp for mxfp4 (#2555 )	2025-08-29 10:06:15 -07:00
Awni Hannun	9c68b50853	version bump (#2554 )	2025-08-29 06:54:17 -07:00
Awni Hannun	111f1e71af	Faster contiguous gather for indices in the first axis (#2552 ) * faster contiguous gather for indices in the first axis * work per thread > 1 * angelos suggestion for scales / biases	2025-08-28 21:26:30 -07:00

1 2 3 4 5 ...

933 Commits