zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-12-16 01:49:05 +08:00

Author	SHA1	Message	Date
Awni Hannun	aad49f932f	[CUDA] Tune ops per buffer based on device (#2761 ) * tune ops per buffer based on device * tune memory limit as well * add tuning for spark	2025-11-16 06:29:49 -08:00
Awni Hannun	bbf1423953	wait for tasks in cuda (#2636 )	2025-09-30 16:08:46 -07:00
Cheng	b466dea982	[CUDA] Make CudaEvent work with multi-device (#2614 ) * Set current device when creating cuda event * Separate cuda events by device * Avoid race condition in pool	2025-09-27 11:27:17 +09:00
Cheng	ae438d05fa	[CUDA] Recycle CUDA events (#2604 ) * Make CudaEvent a CudaHandle * Add caching for CudaEvent * Make sure cuda events are destroyed at last * Fix headers * SharedEvent => AtomicEvent * RawCudaEvent => CudaEventHandle, CudaEventWrapper => CopyableCudaEvent * Remove unneeded asserts	2025-09-23 10:42:03 +09:00
Cheng	b26d88591c	[CUDA] Save primitive inputs faster (#2449 ) * Add more nvtx loggings * [CUDA] Saving primitive inputs faster * Remove unneeded check	2025-08-01 10:16:06 +09:00
Awni Hannun	1e496ddb82	[CUDA] Simplify allocator (#2392 ) * simplify allocator and fixe race with small pool * Don't use shared event in worker * use cuda buffer in small pool * comment * comment	2025-07-22 08:24:01 -07:00
Awni Hannun	ec0d5db67b	[CUDA] Switch to CUDA graphs (#2317 ) * cuda graph prototype fix signal bug + start to add dependencies capture more capture more ops remaining ops fix reduce and rope deps add concurrent context try update, but not working cosistent topology order use node api use node api directly to reduce overhead fix bug use kernels in unary cache graph format fix synchronization format * comment	2025-07-02 15:59:13 -07:00
Awni Hannun	cad5c0241c	[CUDA] synch properly waits for all tasks to finish and clear (#2303 ) * cuda synch properly waits for all tasks to finish and clear * fix copy	2025-06-17 12:03:25 -07:00
Cheng	0cae0bdac8	CUDA backend: backbone (#2075 )	2025-05-06 21:26:46 -07:00

9 Commits