mlx/mlx/backend/metal
Jack d197c18528 Add set_threadgroup_memory_length to CommandEncoder
This method exposes the Metal API's setThreadgroupMemoryLength functionality,
which is needed when implementing custom kernels that require configuring
threadgroup memory size. This allows for better performance tuning in
specialized Metal compute operations that rely on shared threadgroup memory.
2025-05-13 21:45:30 -04:00
..
jit Gather mm new kernel and small refactoring (#2040) 2025-04-14 16:37:36 -07:00
kernels Fix fft for integer overflow (#2161) 2025-05-09 14:25:12 -07:00
allocator.cpp Generalize gpu backend (#2138) 2025-04-30 09:08:17 -07:00
allocator.h wire cache (#2006) 2025-03-25 18:54:01 -07:00
binary.cpp fix bw for elementwise ops (#2151) 2025-05-05 06:15:04 -07:00
binary.h Fixes for large arrays with a few ops (#1299) 2024-07-30 17:18:39 -07:00
CMakeLists.txt Generalize gpu backend (#2138) 2025-04-30 09:08:17 -07:00
compiled.cpp fix bw for elementwise ops (#2151) 2025-05-05 06:15:04 -07:00
conv.cpp fix: conv_general differences between gpu, cpu (#2070) 2025-05-09 10:26:52 -07:00
copy.cpp Move common gpu primitives to backend/gpu (#2145) 2025-05-05 13:45:29 -07:00
custom_kernel.cpp Move common gpu primitives to backend/gpu (#2145) 2025-05-05 13:45:29 -07:00
device.cpp Generalize gpu backend (#2138) 2025-04-30 09:08:17 -07:00
device.h Add set_threadgroup_memory_length to CommandEncoder 2025-05-13 21:45:30 -04:00
distributed.cpp Move common gpu primitives to backend/gpu (#2145) 2025-05-05 13:45:29 -07:00
eval.cpp Generalize gpu backend (#2138) 2025-04-30 09:08:17 -07:00
event.cpp Generalize gpu backend (#2138) 2025-04-30 09:08:17 -07:00
fence.cpp fix input coherent kernel launch (#2153) 2025-05-05 17:30:50 -07:00
fft.cpp Fix fft for integer overflow (#2161) 2025-05-09 14:25:12 -07:00
hadamard.cpp Move common gpu primitives to backend/gpu (#2145) 2025-05-05 13:45:29 -07:00
indexing.cpp Add remove_index utility (#2173) 2025-05-13 17:09:56 -07:00
jit_kernels.cpp Gather qmm batched kernel and refactoring of quantized (#2078) 2025-04-17 13:53:11 -07:00
kernels.h Gather qmm batched kernel and refactoring of quantized (#2078) 2025-04-17 13:53:11 -07:00
logsumexp.cpp Move common gpu primitives to backend/gpu (#2145) 2025-05-05 13:45:29 -07:00
make_compiled_preamble.sh Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
matmul.cpp Close a couple edge case bugs: hadamard and addmm on empty inputs (#2177) 2025-05-12 10:48:57 -07:00
matmul.h Use int64 stride everywhere (#1671) 2024-12-09 11:09:02 -08:00
metal.cpp Generalize gpu backend (#2138) 2025-04-30 09:08:17 -07:00
metal.h Generalize gpu backend (#2138) 2025-04-30 09:08:17 -07:00
no_metal.cpp Generalize gpu backend (#2138) 2025-04-30 09:08:17 -07:00
nojit_kernels.cpp Gather qmm batched kernel and refactoring of quantized (#2078) 2025-04-17 13:53:11 -07:00
normalization.cpp Move common gpu primitives to backend/gpu (#2145) 2025-05-05 13:45:29 -07:00
primitives.cpp Move common gpu primitives to backend/gpu (#2145) 2025-05-05 13:45:29 -07:00
quantized.cpp Move common gpu primitives to backend/gpu (#2145) 2025-05-05 13:45:29 -07:00
reduce.cpp Move common gpu primitives to backend/gpu (#2145) 2025-05-05 13:45:29 -07:00
reduce.h Reductions update (#1351) 2024-11-04 22:25:16 -08:00
resident.cpp Generalize gpu backend (#2138) 2025-04-30 09:08:17 -07:00
resident.h Wired (#1510) 2024-10-25 09:35:33 -07:00
rope.cpp Move common gpu primitives to backend/gpu (#2145) 2025-05-05 13:45:29 -07:00
scaled_dot_product_attention.cpp Move common gpu primitives to backend/gpu (#2145) 2025-05-05 13:45:29 -07:00
scan.cpp Move common gpu primitives to backend/gpu (#2145) 2025-05-05 13:45:29 -07:00
slicing.cpp Move common gpu primitives to backend/gpu (#2145) 2025-05-05 13:45:29 -07:00
softmax.cpp Move common gpu primitives to backend/gpu (#2145) 2025-05-05 13:45:29 -07:00
sort.cpp Move common gpu primitives to backend/gpu (#2145) 2025-05-05 13:45:29 -07:00
ternary.cpp fix bw for elementwise ops (#2151) 2025-05-05 06:15:04 -07:00
ternary.h Add some internal GPU apis (#1177) 2024-06-04 09:24:26 -07:00
unary.cpp fix bw for elementwise ops (#2151) 2025-05-05 06:15:04 -07:00
unary.h Add some internal GPU apis (#1177) 2024-06-04 09:24:26 -07:00
utils.cpp Fp64 on the CPU (#1843) 2025-02-07 15:52:22 -08:00
utils.h fix bw for elementwise ops (#2151) 2025-05-05 06:15:04 -07:00