zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-12-16 01:49:05 +08:00

Author	SHA1	Message	Date
russellizadi	512281781c	Remove state return from function example in compile documentation (#2518 )	2025-08-20 00:45:05 -07:00
Cheng	ac85ddfdb7	[CUDA] Add GEMM-based fallback convolution kernels (#2511 ) * Add gemm_conv * Add gemm_grouped_conv	2025-08-20 10:06:22 +09:00
Cheng	65d0d40232	Split cuDNN helpers into a separate header (#2491 ) * Add RAII managed CudaGraph class * Implement forward rms_norm with cuDNN * Revert back to old rms norm kernel	2025-08-20 09:29:28 +09:00
Awni Hannun	cea9369610	fix lapack svd (#2515 )	2025-08-18 15:07:59 -07:00
Awni Hannun	e7c6e1db82	no segfault with uninitialized array.at (#2514 )	2025-08-18 08:33:38 -07:00
Awni Hannun	c5fcd5b61b	fix custom kernel test (#2510 )	2025-08-18 06:45:59 -07:00
Angelos Katharopoulos	1df9887998	Ensure no oob read in gemv_masked (#2508 )	2025-08-17 08:42:33 -07:00
Angelos Katharopoulos	73f22d6226	Ensure small sort doesn't use indices if not argsort (#2506 )	2025-08-17 08:42:20 -07:00
Cheng	c422050ca7	Update cuDNN Frontend to v1.14 (#2505 )	2025-08-17 19:13:01 +09:00
Cheng	1ba18ff7d9	[CUDA] Fix conv grads with groups (#2495 ) * Put reshape utils in one file * [CUDA] Fix conv grads with groups * Put the reshape utils in gpu/copy.h	2025-08-16 10:09:18 +09:00
Cheng	37b440faa8	Clean up code handling both std::vector and SmallVector (#2493 )	2025-08-16 09:01:10 +09:00
Cheng	888b13ed63	Remove the hack around SmallVector in cpu compile (#2494 )	2025-08-16 08:17:24 +09:00
Cheng	4abb218d21	The naive_conv_2d is no longer used (#2496 )	2025-08-16 07:57:30 +09:00
Awni Hannun	6441c21a94	Faster general unary op (#2472 ) * faster general unary op * faster general ops + reorg * fix + comment * binary two * copy general	2025-08-15 15:04:12 -07:00
Cheng	dfb5022eab	Rename cu::Matmul to CublasGemm (#2488 )	2025-08-13 09:37:40 +09:00
Daniel Yeh	ac207ce7aa	make code blocks copyable (#2480 ) Co-authored-by: Chen-Chen Yeh <ge96noj@mytum.de>	2025-08-12 12:29:02 -07:00
Abe Leininger	fce53b61d6	Fix reduce sum/prod overflow (#2477 )	2025-08-12 00:05:33 -07:00
Angelos Katharopoulos	8ae4a76308	Use CMake <4.1 to avoid the nvpl error (#2489 )	2025-08-12 00:03:42 -07:00
Cheng	7fde1b6a1e	Fix logsumexp/softmax not fused for some cases (#2474 )	2025-08-08 14:07:17 -07:00
Cheng	aa7b47481a	[CUDA] Optimize set_mm_device_pointers for small ndim (#2473 )	2025-08-08 15:23:30 +09:00
Awni Hannun	56be773610	version (#2470 ) v0.28.0	2025-08-07 00:36:04 -07:00
Jagrit Digani	a9bdd67baa	Add CUDA sdpa vector (#2468 )	2025-08-06 21:40:26 -07:00
Angelos Katharopoulos	f2adb5638d	Fix typo in metal command encoder (#2471 )	2025-08-06 16:58:23 -07:00
Luca Vivona	728d4db582	Support destination arg in tree flatten/unflatten (#2450 )	2025-08-06 15:34:59 -07:00
Awni Hannun	db5c7efcf6	revert default cuda install (#2465 ) * revert default cuda install * revert default cuda install	2025-08-06 06:19:12 -07:00
Awni Hannun	7bb96e4249	fix cublas on h100 (#2466 )	2025-08-06 06:18:58 -07:00
Awni Hannun	fa89f0b150	faster gather qmm sorted test (#2463 )	2025-08-05 06:27:40 -07:00
Awni Hannun	ca973d1e83	fix install tags (#2464 )	2025-08-04 20:01:23 -07:00
Cheng	828c5f1137	Use SmallVector for shapes and strides (#2454 ) * Use SmallVector for shapes and strides * Convert SmallVector to tuple	2025-08-05 09:41:03 +09:00
Gaétan Lepage	7d86a5c108	Feat: add USE_SYSTEM_FMT CMake option (#2219 )	2025-08-04 16:36:11 -07:00
Awni Hannun	0b807893a7	fix wraps compile (#2461 )	2025-08-04 16:14:18 -07:00
Awni Hannun	6ad0889c8a	default install cuda on linux (#2462 )	2025-08-04 15:33:05 -07:00
Zamderax	737dd6d1ac	Add missing <algorithm> header to jit_compiler.cpp (#2460 ) Fixes compilation error on Linux where std::find_if is used on line 121 but the <algorithm> header was not included. While this might work on some platforms due to transitive includes, it's not guaranteed by the C++ standard. Resolves issue #2459	2025-08-04 14:00:46 -07:00
Cheng	aaf78f4c6b	Use LRU cache for cuda graph (#2448 ) * Use LRU cache for cuda graph * Remove unused destructor	2025-08-02 21:28:57 +09:00
Angelos Katharopoulos	8831064493	Fix arctan2 grads (#2453 )	2025-08-01 21:06:04 -07:00
Angelos Katharopoulos	be9bc96da4	[CUDA] Matmul utils initial commit (#2441 )	2025-08-01 14:22:25 -07:00
Angelos Katharopoulos	86258f292f	[CUDA] Vectorize generated kernels (#2444 )	2025-07-31 18:18:57 -07:00
Cheng	b26d88591c	[CUDA] Save primitive inputs faster (#2449 ) * Add more nvtx loggings * [CUDA] Saving primitive inputs faster * Remove unneeded check	2025-08-01 10:16:06 +09:00
Cheng	86c6a15571	[CUDA] Backward convolution (#2431 )	2025-08-01 09:54:05 +09:00
junpeiz	8b25ce62d5	Add tests for export including control flow models and quantized models (#2430 ) * Add tests for export, including control flow export and quantized model export. * Skip quantization related test for CUDA backend.	2025-07-31 11:06:26 -07:00
Awni Hannun	da5912e4f2	fix custom metal extension (#2446 )	2025-07-31 06:25:36 -07:00
Cheng	daafee676f	Fix wrong graph key when using concurrent context (#2447 )	2025-07-31 06:01:05 -07:00
Awni Hannun	d32519c8ee	fix gemv regression (#2445 )	2025-07-30 14:23:01 -07:00
Awni Hannun	b405591249	fix circular reference (#2443 )	2025-07-30 09:37:44 -07:00
Angelos Katharopoulos	3bf81ed1bd	[CUDA] Quantized refactoring (#2442 )	2025-07-30 08:27:20 -07:00
Cheng	2204182bba	Make CI faster (#2440 )	2025-07-30 02:26:36 -07:00
Cheng	3628e5d497	Use load_vector in arg_reduce (#2439 )	2025-07-30 17:40:26 +09:00
Cheng	a0ae49d397	Move arange to its own file (#2438 )	2025-07-30 13:05:51 +09:00
Cheng	254476718b	Remove the kernel arg from get_launch_args (#2437 )	2025-07-30 11:43:02 +09:00
Awni Hannun	3adba92ebe	Cuda faster softmax (#2435 ) * faster softmax and logsumexp * faster softmax and logsumexp * format	2025-07-29 17:18:12 -07:00

1 2 3 4 5 ...

1395 Commits