zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-12-16 01:49:05 +08:00

Author	SHA1	Message	Date
Cheng	48e796bb91	Do error check for cublas handle	2025-07-24 00:32:25 +00:00
Cheng	4c0dc7745f	Make LRUCache more like a normal container	2025-07-24 00:32:25 +00:00
Cheng	3d16cb5071	Set cudnn stream before execution	2025-07-24 00:32:25 +00:00
Cheng	67a5f7b2a8	Test the native cuda graph api	2025-07-24 00:32:25 +00:00
Cheng	85510dae78	Add cache	2025-07-24 00:32:25 +00:00
Cheng	0430a6a74a	Turn off tf32	2025-07-24 00:32:25 +00:00
Cheng	6444b29651	Plan needs to be kept alive	2025-07-24 00:32:25 +00:00
Cheng	c6076fc77b	Switch to backend apis	2025-07-24 00:32:25 +00:00
Cheng	bb6a75bc4a	cudnn only accepts contiguous inputs	2025-07-24 00:32:25 +00:00
Cheng	fecc67509d	Install libcudnn9-dev-cuda-12 in CI	2025-07-24 00:32:24 +00:00
Awni Hannun	75bcb46069	include cudnn as python dep	2025-07-24 00:31:23 +00:00
Cheng	180ec0d3a5	Fix C++ conv tests	2025-07-24 00:30:38 +00:00
Cheng	cea3af6622	More unused backend apis	2025-07-24 00:30:38 +00:00
Cheng	ae9dbb1a9b	Fix recording cudnn conv	2025-07-24 00:30:38 +00:00
Cheng	6571df6ad7	Remove backend apis	2025-07-24 00:30:38 +00:00
Cheng	ad44c4bcd9	Initial implementation	2025-07-24 00:30:38 +00:00
Cheng	04bd515370	Link with cuDNN	2025-07-24 00:30:38 +00:00
Awni Hannun	d1f4d291e8	Fix uv install and add dev release (#2411 ) * fix uv install and add dev release * fix docstring * pin cuda deps * cuda release on cpu-only machine	2025-07-23 16:54:19 -07:00
Awni Hannun	e1840853ce	full row mask in sdpa consistently gives nan (#2406 )	2025-07-23 16:37:03 -07:00
Cheng	0f5ce173da	[CUDA] --compress-mode requires CUDA 12.8 (#2407 )	2025-07-23 06:11:11 -07:00
Cheng	588854195f	Remove unused code in Convolution::vjp (#2408 )	2025-07-23 06:11:00 -07:00
Fangjun Kuang	28d068bce6	Fix an error in the comment for mx.dequantize (#2409 )	2025-07-23 06:10:50 -07:00
Awni Hannun	d107d8d495	add cuda gemv (#2400 )	2025-07-22 08:24:13 -07:00
Awni Hannun	1e496ddb82	[CUDA] Simplify allocator (#2392 ) * simplify allocator and fixe race with small pool * Don't use shared event in worker * use cuda buffer in small pool * comment * comment	2025-07-22 08:24:01 -07:00
Awni Hannun	74eccbf3fa	use size option in binary (#2399 )	2025-07-22 07:00:53 -07:00
Awni Hannun	08638223ca	Fix including stubs in wheel (#2398 ) * fix including stubs in wheel * fix bool_	2025-07-22 06:30:17 -07:00
Cheng	56cc858af9	Add contiguous_copy_cpu util for copying array (#2397 )	2025-07-21 07:30:35 -07:00
Cheng	f55c4ed1d6	Remove thrust iterators (#2396 )	2025-07-21 07:30:27 -07:00
Awni Hannun	93d70419e7	[CUDA] speedup handling scalars (#2389 ) * speedup scalars in cuda * comment	2025-07-18 21:47:31 -07:00
Awni Hannun	63f663d9c6	fix cuda manylinux version to match others (#2388 )	2025-07-18 21:02:16 -07:00
Awni Hannun	84b4d96efa	fix release build + patch bump (#2387 ) v0.26.5	2025-07-18 14:47:37 -07:00
Awni Hannun	aec67f2fa6	patch bump (#2386 )	2025-07-18 12:25:48 -07:00
Gökdeniz Gülmez	deee214a95	Adding support for the Muon Optimizer (#1914 ) * initial commit with workong optmimizer * update ACKNOWLEDGMENTS.md * nits and adding it to test * nits * G.astype(mx.bfloat16) to G.astype(G.dtype) * G.ndim >= 2 to assert G.ndim == 2 * remove coments * replace with mx.addmm * remove comments * format * nits * match muon * fix addmm --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-07-18 12:25:28 -07:00
Cheng	45adec102c	Add contiguous_copy_gpu util for copying array (#2379 )	2025-07-18 06:44:25 -07:00
Cheng	31fc530c76	[CUDA] Add more ways finding CCCL headers in JIT (#2382 )	2025-07-17 15:25:34 -07:00
Awni Hannun	fbb3f65a1a	fix resource leaks in matmul and graph (#2383 )	2025-07-17 06:50:15 -07:00
Angelos Katharopoulos	6b1b8ea91b	[CUDA] Add work per thread to compile (#2368 )	2025-07-17 06:47:52 -07:00
Awni Hannun	b2273733ea	Test with CUDA 12.2 (#2375 ) * Test with CUDA 12.0 * try older image * fix cpu sort	2025-07-16 13:00:37 -07:00
Awni Hannun	f409b229a4	fix ring distributed test (#2380 )	2025-07-16 11:25:24 -07:00
Cheng	30571e2326	Rename the copy util in cpu/copy.h to copy_cpu (#2378 )	2025-07-16 07:34:24 -07:00
Awni Hannun	d7734edd9f	fix complex reduce + nan propagation in min and max (#2377 )	2025-07-15 18:19:47 -07:00
Awni Hannun	2ba69bc8fa	lower memory uniform sampling (#2361 ) * lower memory uniform * use fp32 * fix	2025-07-15 14:22:07 -07:00
Cheng	cb349a291c	[CUDA] Use cuda::std::complex in place of cuComplex (#2372 )	2025-07-15 00:36:13 -07:00
Awni Hannun	f0a0b077a0	Install linux with mlx[cuda] and mlx[cpu] (#2356 ) * install linux with mlx[cuda] and mlx[cpu] * temp for testing * cleanup circle, fix cuda repair * update circle * update circle * decouple python bindings from core libraries	2025-07-14 17:17:33 -07:00
Awni Hannun	49114f28ab	fix flaky test (#2371 )	2025-07-14 17:16:18 -07:00
Awni Hannun	e7d2ebadd2	[CUDA] Affine quantize (#2354 ) * affine quantize and dequantize kernels * format * fix * format	2025-07-14 15:45:44 -07:00
Awni Hannun	e569803d7c	update linux build (#2370 )	2025-07-14 15:13:56 -07:00
Cheng	d34f887abc	Add Primitive::name and remove Primitive::print (#2365 )	2025-07-14 14:06:35 -07:00
Angelos Katharopoulos	5201df5030	Fix imag() vjp (#2367 )	2025-07-14 13:11:16 -07:00
Cheng	2d3c26c565	[CUDA] Do not put kernels in annoymous namespace (#2362 )	2025-07-12 14:24:45 -07:00

1 2 3 4 5 ...

1297 Commits