zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-12-16 01:49:05 +08:00

Author	SHA1	Message	Date
Cheng	c9f4dc851f	Merge build-cuda and build-linux actions (#2783 ) Some checks failed Build and Test / Check Lint (push) Has been cancelled Details Build and Test / Linux (cpu, aarch64) (push) Has been cancelled Details Build and Test / Linux (cpu, x86_64) (push) Has been cancelled Details Build and Test / Linux (cuda-12.6, aarch64) (push) Has been cancelled Details Build and Test / Linux (cuda-12.9, aarch64) (push) Has been cancelled Details Build and Test / Linux (cuda-12.6, x86_64) (push) Has been cancelled Details Build and Test / Linux (cuda-12.9, x86_64) (push) Has been cancelled Details Build and Test / macOS (14.0) (push) Has been cancelled Details Build and Test / macOS (15.0) (push) Has been cancelled Details Build and Test / Build Documentation (push) Has been cancelled Details Build and Test / Linux Fedora (aarch64) (push) Has been cancelled Details Build and Test / Linux Fedora (x86_64) (push) Has been cancelled Details	2025-11-25 20:06:42 +09:00
Cheng	6f35017d1b	[CUDA] cuDNN backward attention (#2762 ) Some checks failed Build and Test / check_lint (push) Has been cancelled Details Build and Test / linux_build_and_test (ubuntu-22.04) (push) Has been cancelled Details Build and Test / linux_build_and_test (ubuntu-22.04-arm) (push) Has been cancelled Details Build and Test / mac_build_and_test (14.0) (push) Has been cancelled Details Build and Test / mac_build_and_test (15.0) (push) Has been cancelled Details Build and Test / cuda_build_and_test (cuda-12.6) (push) Has been cancelled Details Build and Test / cuda_build_and_test (cuda-12.9) (push) Has been cancelled Details Build and Test / build_documentation (push) Has been cancelled Details Build and Test / Linux Fedora CPP Build (aarch64) (push) Has been cancelled Details Build and Test / Linux Fedora CPP Build (x86_64) (push) Has been cancelled Details Nightly Build / build_linux_release (3.10) (push) Has been cancelled Details Nightly Build / build_linux_release (3.14) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled Details Nightly Build / build_mac_release (3.10) (push) Has been cancelled Details Nightly Build / build_mac_release (3.13) (push) Has been cancelled Details Nightly Build / build_cuda_release (push) Has been cancelled Details	2025-11-19 08:13:50 +09:00
Cheng	940f4c7818	Fix building with CUDA < 12.8 (#2782 ) Some checks failed Build and Test / check_lint (push) Has been cancelled Details Build and Test / linux_build_and_test (ubuntu-22.04) (push) Has been cancelled Details Build and Test / linux_build_and_test (ubuntu-22.04-arm) (push) Has been cancelled Details Build and Test / mac_build_and_test (14.0) (push) Has been cancelled Details Build and Test / mac_build_and_test (15.0) (push) Has been cancelled Details Build and Test / cuda_build_and_test (cuda-12.6) (push) Has been cancelled Details Build and Test / cuda_build_and_test (cuda-12.9) (push) Has been cancelled Details Build and Test / build_documentation (push) Has been cancelled Details Build and Test / Linux Fedora CPP Build (aarch64) (push) Has been cancelled Details Build and Test / Linux Fedora CPP Build (x86_64) (push) Has been cancelled Details Nightly Build / build_linux_release (3.10) (push) Has been cancelled Details Nightly Build / build_linux_release (3.14) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled Details Nightly Build / build_mac_release (3.10) (push) Has been cancelled Details Nightly Build / build_mac_release (3.13) (push) Has been cancelled Details Nightly Build / build_cuda_release (push) Has been cancelled Details	2025-11-18 12:55:19 +09:00
Awni Hannun	1bf605d56d	use arch specific targets when possible (#2771 )	2025-11-14 20:04:18 -08:00
Cheng	3b2ffcefc3	[CUDA] cuDNN forward attention (#2743 ) Some checks failed Nightly Build / build_linux_release (3.10) (push) Has been cancelled Details Nightly Build / build_linux_release (3.14) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.10, ubuntu-22.04) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.10, ubuntu-22.04-arm) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled Details Nightly Build / build_mac_release (3.10) (push) Has been cancelled Details Nightly Build / build_mac_release (3.13) (push) Has been cancelled Details Nightly Build / build_cuda_with_tests (push) Has been cancelled Details Nightly Build / build_cuda_release (push) Has been cancelled Details Nightly Build / Linux Fedora CPP Build (aarch64) (push) Has been cancelled Details Nightly Build / Linux Fedora CPP Build (x86_64) (push) Has been cancelled Details * Separate sdpa kernels in another file * Initial support for cuDNN SDPA * Diable a few corner cases * Remove scaled_dot_product_attention.h * Use cuDNN attention for prefilling * cuDNN SDPA requires Ampere and later * Address reviews * Do contiguous copy of inputs	2025-11-14 09:23:56 +09:00
Awni Hannun	df58b4133a	[CUDA] Reduce use of managed memory (#2725 ) Some checks failed Nightly Build / build_linux_release (3.10) (push) Has been cancelled Details Nightly Build / build_linux_release (3.14) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.10) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.11) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.12) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.13) (push) Has been cancelled Details Nightly Build / build_linux_with_tests (3.14) (push) Has been cancelled Details Nightly Build / build_mac_release (3.10) (push) Has been cancelled Details Nightly Build / build_mac_release (3.13) (push) Has been cancelled Details Nightly Build / build_cuda_with_tests (push) Has been cancelled Details Nightly Build / build_cuda_release (push) Has been cancelled Details Nightly Build / Linux Fedora CPP Build (aarch64) (push) Has been cancelled Details Nightly Build / Linux Fedora CPP Build (x86_64) (push) Has been cancelled Details * Use async cuda malloc managed with cuda 13 * add pool threshold * refactor for regular cuda malloc * load eval gpu for cuda * remove use of cuda pool, use cuda free async * fix * fix * fix * fix * fix + comment	2025-11-05 16:05:23 -08:00
Awni Hannun	ec72b44417	Add quantize/dequantize for mxfp8 and nvfp4 (#2688 ) * Add quantize/dequantize slow path for mxfp8 and nvfp4 * fast cuda kernel for mx/nv quantization * fallback for cuda < 12.8 (#2697) * format (#2700) * fix (#2701) * metal kernels * docs * fix jit * add default bits and group sizes * improve quant docs * fix output type of mxfp4 matmuls	2025-10-28 16:23:12 -07:00
Awni Hannun	969924cc69	Fp8 conversion (#2686 ) * add fp8 e4m3 converters * add cuda * default saturate to min/max * fix for older OS * fix no gpu/cpu * fix saturate * fix compile	2025-10-27 16:35:50 -07:00
Awni Hannun	4bce5f9b2d	suppress gcc 10.1 warnings (#2679 ) * suppress gcc 10.1 warnings * suppress gcc 10.1 warnings	2025-10-17 12:09:21 -07:00
Anastasiia Filippova	9392fc3f88	NCCL backend (#2476 )	2025-08-21 11:56:15 -07:00
Angelos Katharopoulos	e397177f6e	Custom cuda kernel (#2517 )	2025-08-20 17:20:22 -07:00
Cheng	ac85ddfdb7	[CUDA] Add GEMM-based fallback convolution kernels (#2511 ) * Add gemm_conv * Add gemm_grouped_conv	2025-08-20 10:06:22 +09:00
Cheng	65d0d40232	Split cuDNN helpers into a separate header (#2491 ) * Add RAII managed CudaGraph class * Implement forward rms_norm with cuDNN * Revert back to old rms norm kernel	2025-08-20 09:29:28 +09:00
Cheng	c422050ca7	Update cuDNN Frontend to v1.14 (#2505 )	2025-08-17 19:13:01 +09:00
Awni Hannun	6441c21a94	Faster general unary op (#2472 ) * faster general unary op * faster general ops + reorg * fix + comment * binary two * copy general	2025-08-15 15:04:12 -07:00
Cheng	dfb5022eab	Rename cu::Matmul to CublasGemm (#2488 )	2025-08-13 09:37:40 +09:00
Jagrit Digani	a9bdd67baa	Add CUDA sdpa vector (#2468 )	2025-08-06 21:40:26 -07:00
Angelos Katharopoulos	3bf81ed1bd	[CUDA] Quantized refactoring (#2442 )	2025-07-30 08:27:20 -07:00
Cheng	a0ae49d397	Move arange to its own file (#2438 )	2025-07-30 13:05:51 +09:00
Awni Hannun	641be9463b	Add more CUDA architectures for PyPi package (#2427 ) * add cuda sm 90 * add more archs	2025-07-28 12:35:15 -07:00
Awni Hannun	9acec364c2	[CUDA] Always use batched matmul (#2404 ) * cuda batched mm * addmm as well * comment	2025-07-24 20:46:02 -07:00
Cheng	6f5874a2f2	[CUDA] Initial implementation of Convolution with cuDNN (#2385 ) * Link with cuDNN * Initial implementation * Remove backend apis * Fix recording cudnn conv * More unused backend apis * Fix C++ conv tests * include cudnn as python dep * Install libcudnn9-dev-cuda-12 in CI * cudnn only accepts contiguous inputs * Switch to backend apis * Plan needs to be kept alive * Turn off tf32 * Add cache * Test the native cuda graph api * Set cudnn stream before execution * Make LRUCache more like a normal container * Do error check for cublas handle * Zero-initilizing array * Use tf32 for conv * Skip TestConv.test_torch_conv_2D test --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-07-25 08:12:10 +09:00
Cheng	0f5ce173da	[CUDA] --compress-mode requires CUDA 12.8 (#2407 )	2025-07-23 06:11:11 -07:00
Awni Hannun	d107d8d495	add cuda gemv (#2400 )	2025-07-22 08:24:13 -07:00
Awni Hannun	74eccbf3fa	use size option in binary (#2399 )	2025-07-22 07:00:53 -07:00
Awni Hannun	e7d2ebadd2	[CUDA] Affine quantize (#2354 ) * affine quantize and dequantize kernels * format * fix * format	2025-07-14 15:45:44 -07:00
Cheng	6325f60d52	[CUDA] Bundle CCCL for JIT compilation (#2357 ) * Ship CCCL for JIT compilation * Remove cexpf	2025-07-11 18:45:37 -07:00
Cheng	8347575ba1	[CUDA] Implement Scan kernel (#2347 ) * Contiguous scan * Strided scan * Enable tests * Fix failing logaddexp test * Use cexpf in Metal	2025-07-10 16:54:12 -07:00
Angelos Katharopoulos	772f471ff2	[CUDA] Fix reductions (#2314 )	2025-06-27 12:59:20 -07:00
Awni Hannun	b8022c578a	divmod, partition, sort fixes (#2302 )	2025-06-16 18:49:32 -07:00
Angelos Katharopoulos	580776559b	RoPE for CUDA (#2293 ) * First working CUDA rope * Fix random	2025-06-15 06:08:07 -07:00
Cheng	c8b4787e4e	CUDA backend: indexing ops (#2277 )	2025-06-12 21:44:19 -07:00
Awni Hannun	2188199ff8	[CUDA] ternary with select op (#2283 ) * cuda ternary with select op * comment + fix * fix	2025-06-12 20:24:43 -07:00
Awni Hannun	aa07429bad	Fix cuda build (#2284 )	2025-06-12 17:48:05 -07:00
Awni Hannun	918761a25a	[CUDA] RMSNorm and VJP (#2280 ) * rms norm start * nit	2025-06-12 17:09:49 -07:00
Cheng	a4fc671d3e	CUDA backend: compile (#2276 ) * CUDA backend: compile * Rename kernels/ to device/	2025-06-12 17:08:39 -07:00
Cheng	c2dd81a8aa	Fix warnings from latest CUDA toolkit (#2275 )	2025-06-12 06:03:01 -07:00
Cheng	d7e680ffe4	CUDA backend: layernorm (#2271 )	2025-06-11 15:48:32 -07:00
Cheng	c371baf53a	CUDA backend: softmax (#2272 )	2025-06-11 13:55:22 -07:00
Cheng	ccf78f566c	CUDA backend: argreduce (#2270 )	2025-06-11 13:26:17 -07:00
Cheng	c9fa68664a	CUDA backend: reduce (#2269 )	2025-06-11 11:22:25 -07:00
Awni Hannun	c35f4d089a	start cuda circle config (#2256 ) * rebase * fix metal kernel linking issue on cuda * start cuda circle config	2025-06-10 21:19:47 -07:00
Cheng	99c33d011d	rebase + nit (#2260 ) Co-authored-by: Awni Hannun <awni@apple.com>	2025-06-10 10:51:51 -07:00
Cheng	7c4eb5d03e	CUDA backend: random (#2261 )	2025-06-10 08:59:56 -07:00
Cheng	bae9a6b404	CUDA backend: sort (#2262 ) Co-authored-by: Awni Hannun <awni@apple.com>	2025-06-10 08:59:47 -07:00
Cheng	7ebb2e0193	CUDA backend: binary ops (#2259 )	2025-06-10 06:37:40 -07:00
Cheng	f8bad60609	CUDA backend: unary ops (#2158 )	2025-06-09 06:45:08 -07:00
Cheng	24f89173d1	CUDA backend: matmul (#2241 )	2025-06-06 12:24:04 -07:00
Cheng	52dc8c8cd5	Add profiler annotations in common primitives for CUDA backend (#2244 )	2025-06-04 19:55:12 -07:00
Cheng	85a8beb5e4	Avoid atomic updates across CPU/GPU in CUDA event (#2231 )	2025-06-03 16:49:06 -07:00

1 2

55 Commits