mlx/mlx
Awni Hannun ec0d5db67b
[CUDA] Switch to CUDA graphs (#2317)
* cuda graph prototype

fix signal bug + start to add dependencies

capture more

capture more ops

remaining ops

fix reduce and rope deps

add concurrent context

try update, but not working

cosistent topology order

use node api

use node api directly to reduce overhead

fix bug

use kernels in unary

cache graph

format

fix synchronization

format

* comment
2025-07-02 15:59:13 -07:00
..
3rdparty jagrit's commit files 2023-11-29 10:52:08 -08:00
backend [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
distributed Make sliceUpdate general (#2282) 2025-06-12 16:48:54 -07:00
io Remove static initializers (#2059) 2025-04-24 06:14:49 -07:00
types fix pinv (#2110) 2025-04-23 13:08:28 -07:00
allocator.cpp Add stats and limit to common allocator and enable tests (#1988) 2025-03-21 12:28:36 -07:00
allocator.h Add stats and limit to common allocator and enable tests (#1988) 2025-03-21 12:28:36 -07:00
array.cpp reduce binary size (#1952) 2025-03-11 06:30:44 -07:00
array.h Add complex eigh (#2191) 2025-05-18 00:18:43 -07:00
CMakeLists.txt start cuda circle config (#2256) 2025-06-10 21:19:47 -07:00
compile_impl.h Simplify removes no-ops from the tape (#1759) 2025-01-09 11:23:19 -08:00
compile.cpp Split broadcast so it is always fused in compile (#2318) 2025-06-26 22:08:18 -07:00
compile.h fix function pointer (#1865) 2025-02-13 18:46:11 -08:00
device.cpp Generalize gpu backend (#2138) 2025-04-30 09:08:17 -07:00
device.h Generalize gpu backend (#2138) 2025-04-30 09:08:17 -07:00
dtype_utils.cpp MLX_SWITCH macros to templates (#2320) 2025-07-01 01:33:44 -07:00
dtype_utils.h MLX_SWITCH macros to templates (#2320) 2025-07-01 01:33:44 -07:00
dtype.cpp fix double type promotion (#1901) 2025-02-25 06:00:53 -08:00
dtype.h Fp64 on the CPU (#1843) 2025-02-07 15:52:22 -08:00
einsum.cpp Einsum ellipsis (#1788) 2025-01-25 01:28:03 -08:00
einsum.h Einsum (#1269) 2024-07-25 09:36:44 -07:00
event.h Remove Event::Signal() (#2052) 2025-04-08 06:20:27 -07:00
export_impl.h Export / import functions to / from a file (#1642) 2024-12-24 11:19:13 -08:00
export.cpp fix export to work with gather/scatter axis (#2263) 2025-06-09 20:37:27 -07:00
export.h Use unordered map for kwargs in export/import (#2087) 2025-04-21 07:17:22 -07:00
fast_primitives.h Fast primitives decide when to use the fallback (#2216) 2025-06-02 13:26:37 -07:00
fast.cpp Fix unintuitive metal kernel caching (#2242) 2025-06-06 20:08:15 -07:00
fast.h Add new sdpa function overload (#2035) 2025-04-03 11:58:28 -07:00
fence.h redesign for faster cpu/gpu synch (#1869) 2025-03-06 19:23:38 -08:00
fft.cpp add fftshift and ifftshift fft helpers (#2135) 2025-04-29 22:13:45 -07:00
fft.h add fftshift and ifftshift fft helpers (#2135) 2025-04-29 22:13:45 -07:00
graph_utils.cpp Optionally specify names for arrays when exporting (#1749) 2025-01-06 13:07:46 -08:00
graph_utils.h Optionally specify names for arrays when exporting (#1749) 2025-01-06 13:07:46 -08:00
io.h Added missing unordered_map includes (#1635) 2024-12-02 07:03:03 -08:00
linalg.cpp [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
linalg.h non-symmetric eig and eigh (#2188) 2025-05-15 13:01:44 -07:00
memory.h move memory APIs into top level mlx.core (#1982) 2025-03-21 07:25:12 -07:00
mlx.h start cuda circle config (#2256) 2025-06-10 21:19:47 -07:00
ops.cpp Fix complex power and print (#2286) 2025-06-13 11:13:00 -07:00
ops.h Fix typos (#2136) 2025-04-29 07:26:05 -07:00
primitives.cpp reduce vjp for all and any (#2193) 2025-05-16 08:38:49 -07:00
primitives.h fix conv export (#2265) 2025-06-10 09:34:01 -07:00
random.cpp Add random normal distribution for complex numbers (#2182) 2025-05-13 22:43:45 -07:00
random.h Add random normal distribution for complex numbers (#2182) 2025-05-13 22:43:45 -07:00
scheduler.cpp Generalize gpu backend (#2138) 2025-04-30 09:08:17 -07:00
scheduler.h Generalize gpu backend (#2138) 2025-04-30 09:08:17 -07:00
stream.h Export / import functions to / from a file (#1642) 2024-12-24 11:19:13 -08:00
threadpool.h Ring distributed backend (#1784) 2025-01-27 22:15:01 -08:00
transforms_impl.h Remove static initializers (#2059) 2025-04-24 06:14:49 -07:00
transforms.cpp Perf regression fix (#2243) 2025-06-03 17:55:12 -07:00
transforms.h Export / import functions to / from a file (#1642) 2024-12-24 11:19:13 -08:00
utils.cpp MLX_SWITCH macros to templates (#2320) 2025-07-01 01:33:44 -07:00
utils.h Fix cuda arg reduce (#2291) 2025-06-14 17:54:00 -07:00
version.cpp Do not define MLX_VERSION globally (#1966) 2025-03-18 07:12:40 -07:00
version.h patch bump (#2324) 2025-07-01 12:12:16 -07:00