zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-11-01 00:28:11 +08:00

Author	SHA1	Message	Date
Awni Hannun	3adba92ebe	Cuda faster softmax (#2435 ) * faster softmax and logsumexp * faster softmax and logsumexp * format	2025-07-29 17:18:12 -07:00
Awni Hannun	ef631d63af	faster rms norm (#2433 )	2025-07-29 13:12:00 -07:00
Cheng	970dbe8e25	Use ccache in CI (#2414 ) * Detect ccache * Use ccache in CI * Separate cache for different images * Test both 12.2 and 12.9 for PRs	2025-07-29 08:43:22 +09:00
Awni Hannun	641be9463b	Add more CUDA architectures for PyPi package (#2427 ) * add cuda sm 90 * add more archs	2025-07-28 12:35:15 -07:00
Awni Hannun	ab0e608862	[CUDA] More sizes for gemv (#2429 ) * route more to gemv * route more sizes to custom gemv	2025-07-28 12:35:01 -07:00
Awni Hannun	1588659062	no occupancy query for launch params (#2426 )	2025-07-28 09:09:41 -07:00
Awni Hannun	b9e88fb976	[CUDA] Fix segfault on exit (#2424 ) * fix cuda segfault on exit * comment	2025-07-27 08:08:13 -07:00
Awni Hannun	4ad53414dd	fix cuda pypi package (#2423 ) * fix cuda pypi package * patch bump v0.27.1	2025-07-25 15:20:29 -07:00
Awni Hannun	d1165b215e	version (#2420 )	2025-07-25 13:29:28 -07:00
Awni Hannun	dcb8319f3d	update install docs and requirements (#2419 )	2025-07-25 12:13:19 -07:00
Awni Hannun	5597fa089c	Fix qvm splitk (#2415 )	2025-07-25 11:50:24 -07:00
Awni Hannun	9acec364c2	[CUDA] Always use batched matmul (#2404 ) * cuda batched mm * addmm as well * comment	2025-07-24 20:46:02 -07:00
Skonor	7d9d6ef456	docs: fix adam and adamw eps placement (#2416 ) Co-authored-by: Mikhail Gorbunov <m_gorbunov@apple.com>	2025-07-24 16:40:45 -07:00
Cheng	6f5874a2f2	[CUDA] Initial implementation of Convolution with cuDNN (#2385 ) * Link with cuDNN * Initial implementation * Remove backend apis * Fix recording cudnn conv * More unused backend apis * Fix C++ conv tests * include cudnn as python dep * Install libcudnn9-dev-cuda-12 in CI * cudnn only accepts contiguous inputs * Switch to backend apis * Plan needs to be kept alive * Turn off tf32 * Add cache * Test the native cuda graph api * Set cudnn stream before execution * Make LRUCache more like a normal container * Do error check for cublas handle * Zero-initilizing array * Use tf32 for conv * Skip TestConv.test_torch_conv_2D test --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-07-25 08:12:10 +09:00
Awni Hannun	70dc336785	Test on cuda 12.2 and 12.9 (#2413 )	2025-07-24 06:06:15 -07:00
Awni Hannun	4e504039f5	[Metal] Release metal events (#2412 ) * release metal events * fix * fix	2025-07-23 19:53:42 -07:00
Awni Hannun	d1f4d291e8	Fix uv install and add dev release (#2411 ) * fix uv install and add dev release * fix docstring * pin cuda deps * cuda release on cpu-only machine	2025-07-23 16:54:19 -07:00
Awni Hannun	e1840853ce	full row mask in sdpa consistently gives nan (#2406 )	2025-07-23 16:37:03 -07:00
Cheng	0f5ce173da	[CUDA] --compress-mode requires CUDA 12.8 (#2407 )	2025-07-23 06:11:11 -07:00
Cheng	588854195f	Remove unused code in Convolution::vjp (#2408 )	2025-07-23 06:11:00 -07:00
Fangjun Kuang	28d068bce6	Fix an error in the comment for mx.dequantize (#2409 )	2025-07-23 06:10:50 -07:00
Awni Hannun	d107d8d495	add cuda gemv (#2400 )	2025-07-22 08:24:13 -07:00
Awni Hannun	1e496ddb82	[CUDA] Simplify allocator (#2392 ) * simplify allocator and fixe race with small pool * Don't use shared event in worker * use cuda buffer in small pool * comment * comment	2025-07-22 08:24:01 -07:00
Awni Hannun	74eccbf3fa	use size option in binary (#2399 )	2025-07-22 07:00:53 -07:00
Awni Hannun	08638223ca	Fix including stubs in wheel (#2398 ) * fix including stubs in wheel * fix bool_	2025-07-22 06:30:17 -07:00
Cheng	56cc858af9	Add contiguous_copy_cpu util for copying array (#2397 )	2025-07-21 07:30:35 -07:00
Cheng	f55c4ed1d6	Remove thrust iterators (#2396 )	2025-07-21 07:30:27 -07:00
Awni Hannun	93d70419e7	[CUDA] speedup handling scalars (#2389 ) * speedup scalars in cuda * comment	2025-07-18 21:47:31 -07:00
Awni Hannun	63f663d9c6	fix cuda manylinux version to match others (#2388 )	2025-07-18 21:02:16 -07:00
Awni Hannun	84b4d96efa	fix release build + patch bump (#2387 ) v0.26.5	2025-07-18 14:47:37 -07:00
Awni Hannun	aec67f2fa6	patch bump (#2386 )	2025-07-18 12:25:48 -07:00
Gökdeniz Gülmez	deee214a95	Adding support for the Muon Optimizer (#1914 ) * initial commit with workong optmimizer * update ACKNOWLEDGMENTS.md * nits and adding it to test * nits * G.astype(mx.bfloat16) to G.astype(G.dtype) * G.ndim >= 2 to assert G.ndim == 2 * remove coments * replace with mx.addmm * remove comments * format * nits * match muon * fix addmm --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-07-18 12:25:28 -07:00
Cheng	45adec102c	Add contiguous_copy_gpu util for copying array (#2379 )	2025-07-18 06:44:25 -07:00
Cheng	31fc530c76	[CUDA] Add more ways finding CCCL headers in JIT (#2382 )	2025-07-17 15:25:34 -07:00
Awni Hannun	fbb3f65a1a	fix resource leaks in matmul and graph (#2383 )	2025-07-17 06:50:15 -07:00
Angelos Katharopoulos	6b1b8ea91b	[CUDA] Add work per thread to compile (#2368 )	2025-07-17 06:47:52 -07:00
Awni Hannun	b2273733ea	Test with CUDA 12.2 (#2375 ) * Test with CUDA 12.0 * try older image * fix cpu sort	2025-07-16 13:00:37 -07:00
Awni Hannun	f409b229a4	fix ring distributed test (#2380 )	2025-07-16 11:25:24 -07:00
Cheng	30571e2326	Rename the copy util in cpu/copy.h to copy_cpu (#2378 )	2025-07-16 07:34:24 -07:00
Awni Hannun	d7734edd9f	fix complex reduce + nan propagation in min and max (#2377 )	2025-07-15 18:19:47 -07:00
Awni Hannun	2ba69bc8fa	lower memory uniform sampling (#2361 ) * lower memory uniform * use fp32 * fix	2025-07-15 14:22:07 -07:00
Cheng	cb349a291c	[CUDA] Use cuda::std::complex in place of cuComplex (#2372 )	2025-07-15 00:36:13 -07:00
Awni Hannun	f0a0b077a0	Install linux with mlx[cuda] and mlx[cpu] (#2356 ) * install linux with mlx[cuda] and mlx[cpu] * temp for testing * cleanup circle, fix cuda repair * update circle * update circle * decouple python bindings from core libraries	2025-07-14 17:17:33 -07:00
Awni Hannun	49114f28ab	fix flaky test (#2371 )	2025-07-14 17:16:18 -07:00
Awni Hannun	e7d2ebadd2	[CUDA] Affine quantize (#2354 ) * affine quantize and dequantize kernels * format * fix * format	2025-07-14 15:45:44 -07:00
Awni Hannun	e569803d7c	update linux build (#2370 )	2025-07-14 15:13:56 -07:00
Cheng	d34f887abc	Add Primitive::name and remove Primitive::print (#2365 )	2025-07-14 14:06:35 -07:00
Angelos Katharopoulos	5201df5030	Fix imag() vjp (#2367 )	2025-07-14 13:11:16 -07:00
Cheng	2d3c26c565	[CUDA] Do not put kernels in annoymous namespace (#2362 )	2025-07-12 14:24:45 -07:00
Cheng	6325f60d52	[CUDA] Bundle CCCL for JIT compilation (#2357 ) * Ship CCCL for JIT compilation * Remove cexpf	2025-07-11 18:45:37 -07:00

... 2 3 4 5 6 ...

1446 Commits