mlx/mlx/backend/metal
2025-04-17 14:53:08 -07:00
..
jit Gather mm new kernel and small refactoring (#2040) 2025-04-14 16:37:36 -07:00
kernels Gather qmm batched kernel and refactoring of quantized (#2078) 2025-04-17 13:53:11 -07:00
allocator.cpp only add to residency set once (#2049) 2025-04-06 17:38:25 -07:00
allocator.h wire cache (#2006) 2025-03-25 18:54:01 -07:00
binary.cpp redesign for faster cpu/gpu synch (#1869) 2025-03-06 19:23:38 -08:00
binary.h Fixes for large arrays with a few ops (#1299) 2024-07-30 17:18:39 -07:00
CMakeLists.txt Gather mm new kernel and small refactoring (#2040) 2025-04-14 16:37:36 -07:00
compiled.cpp redesign for faster cpu/gpu synch (#1869) 2025-03-06 19:23:38 -08:00
conv.cpp Depthwise Conv2D optimization (#2036) 2025-04-03 09:42:04 -07:00
copy.cpp fix malloc or wait deadlock (#1976) 2025-03-20 16:48:43 -07:00
copy.h Dynamic slicing (#1741) 2025-01-07 14:02:16 -08:00
custom_kernel.cpp fix malloc or wait deadlock (#1976) 2025-03-20 16:48:43 -07:00
device.cpp Do not load the default lib if another is requested (#2055) 2025-04-09 13:31:38 -07:00
device.h Do not load the default lib if another is requested (#2055) 2025-04-09 13:31:38 -07:00
distributed.cpp redesign for faster cpu/gpu synch (#1869) 2025-03-06 19:23:38 -08:00
event.cpp Remove Event::Signal() (#2052) 2025-04-08 06:20:27 -07:00
fence.cpp fix malloc or wait deadlock (#1976) 2025-03-20 16:48:43 -07:00
fft.cpp fix fft bug (#2062) 2025-04-10 19:41:27 -07:00
hadamard.cpp fix malloc or wait deadlock (#1976) 2025-03-20 16:48:43 -07:00
indexing.cpp fix malloc or wait deadlock (#1976) 2025-03-20 16:48:43 -07:00
jit_kernels.cpp Gather qmm batched kernel and refactoring of quantized (#2078) 2025-04-17 13:53:11 -07:00
kernels.h Gather qmm batched kernel and refactoring of quantized (#2078) 2025-04-17 13:53:11 -07:00
logsumexp.cpp Custom logsumexp (#2028) 2025-03-31 07:36:55 -07:00
make_compiled_preamble.sh Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
matmul.cpp Gather qmm batched kernel and refactoring of quantized (#2078) 2025-04-17 13:53:11 -07:00
matmul.h Use int64 stride everywhere (#1671) 2024-12-09 11:09:02 -08:00
metal_impl.h redesign for faster cpu/gpu synch (#1869) 2025-03-06 19:23:38 -08:00
metal.cpp Fix multistream GPU deadlock (#1969) 2025-03-20 07:19:47 -07:00
metal.h move memory APIs into top level mlx.core (#1982) 2025-03-21 07:25:12 -07:00
nojit_kernels.cpp Gather qmm batched kernel and refactoring of quantized (#2078) 2025-04-17 13:53:11 -07:00
normalization.cpp fix malloc or wait deadlock (#1976) 2025-03-20 16:48:43 -07:00
primitives.cpp Distributed layers (#1270) 2025-03-21 13:52:17 -07:00
quantized.cpp Route to gather qmm only for many tokens per expert (#2082) 2025-04-17 14:53:08 -07:00
reduce.cpp fix malloc or wait deadlock (#1976) 2025-03-20 16:48:43 -07:00
reduce.h Reductions update (#1351) 2024-11-04 22:25:16 -08:00
resident.cpp Only request residency once (#2051) 2025-04-07 10:47:51 -07:00
resident.h Wired (#1510) 2024-10-25 09:35:33 -07:00
rope.cpp fix malloc or wait deadlock (#1976) 2025-03-20 16:48:43 -07:00
scaled_dot_product_attention.cpp Add float mask to sdpa vector (#2068) 2025-04-11 17:29:40 -07:00
scan.cpp LogCumSumExp (#2069) 2025-04-13 01:27:29 -07:00
slicing.cpp fix malloc or wait deadlock (#1976) 2025-03-20 16:48:43 -07:00
slicing.h More shape type (#1705) 2024-12-19 08:08:20 -08:00
softmax.cpp Custom logsumexp (#2028) 2025-03-31 07:36:55 -07:00
sort.cpp fix malloc or wait deadlock (#1976) 2025-03-20 16:48:43 -07:00
ternary.cpp redesign for faster cpu/gpu synch (#1869) 2025-03-06 19:23:38 -08:00
ternary.h Add some internal GPU apis (#1177) 2024-06-04 09:24:26 -07:00
unary.cpp fix malloc or wait deadlock (#1976) 2025-03-20 16:48:43 -07:00
unary.h Add some internal GPU apis (#1177) 2024-06-04 09:24:26 -07:00
utils.cpp Fp64 on the CPU (#1843) 2025-02-07 15:52:22 -08:00
utils.h Gather mm new kernel and small refactoring (#2040) 2025-04-14 16:37:36 -07:00