mlx/metal at d197c185280b5839bb13379cd45e664f3f8263cf - mlx

mirror of https://github.com/ml-explore/mlx.git synced 2025-06-26 18:51:14 +08:00

History

Jack d197c18528 Add set_threadgroup_memory_length to CommandEncoder This method exposes the Metal API's setThreadgroupMemoryLength functionality, which is needed when implementing custom kernels that require configuring threadgroup memory size. This allows for better performance tuning in specialized Metal compute operations that rely on shared threadgroup memory.		2025-05-13 21:45:30 -04:00
..
jit	Gather mm new kernel and small refactoring (#2040 )	2025-04-14 16:37:36 -07:00
kernels	Fix fft for integer overflow (#2161 )	2025-05-09 14:25:12 -07:00
allocator.cpp	Generalize gpu backend (#2138 )	2025-04-30 09:08:17 -07:00
allocator.h	wire cache (#2006 )	2025-03-25 18:54:01 -07:00
binary.cpp	fix bw for elementwise ops (#2151 )	2025-05-05 06:15:04 -07:00
binary.h	Fixes for large arrays with a few ops (#1299 )	2024-07-30 17:18:39 -07:00
CMakeLists.txt	Generalize gpu backend (#2138 )	2025-04-30 09:08:17 -07:00
compiled.cpp	fix bw for elementwise ops (#2151 )	2025-05-05 06:15:04 -07:00
conv.cpp	fix: conv_general differences between gpu, cpu (#2070 )	2025-05-09 10:26:52 -07:00
copy.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
custom_kernel.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
device.cpp	Generalize gpu backend (#2138 )	2025-04-30 09:08:17 -07:00
device.h	Add set_threadgroup_memory_length to CommandEncoder	2025-05-13 21:45:30 -04:00
distributed.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
eval.cpp	Generalize gpu backend (#2138 )	2025-04-30 09:08:17 -07:00
event.cpp	Generalize gpu backend (#2138 )	2025-04-30 09:08:17 -07:00
fence.cpp	fix input coherent kernel launch (#2153 )	2025-05-05 17:30:50 -07:00
fft.cpp	Fix fft for integer overflow (#2161 )	2025-05-09 14:25:12 -07:00
hadamard.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
indexing.cpp	Add remove_index utility (#2173 )	2025-05-13 17:09:56 -07:00
jit_kernels.cpp	Gather qmm batched kernel and refactoring of quantized (#2078 )	2025-04-17 13:53:11 -07:00
kernels.h	Gather qmm batched kernel and refactoring of quantized (#2078 )	2025-04-17 13:53:11 -07:00
logsumexp.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
make_compiled_preamble.sh	Dispatch bf16 at run time when using the JIT (#1584 )	2024-11-15 16:54:36 -08:00
matmul.cpp	Close a couple edge case bugs: hadamard and addmm on empty inputs (#2177 )	2025-05-12 10:48:57 -07:00
matmul.h	Use int64 stride everywhere (#1671 )	2024-12-09 11:09:02 -08:00
metal.cpp	Generalize gpu backend (#2138 )	2025-04-30 09:08:17 -07:00
metal.h	Generalize gpu backend (#2138 )	2025-04-30 09:08:17 -07:00
no_metal.cpp	Generalize gpu backend (#2138 )	2025-04-30 09:08:17 -07:00
nojit_kernels.cpp	Gather qmm batched kernel and refactoring of quantized (#2078 )	2025-04-17 13:53:11 -07:00
normalization.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
primitives.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
quantized.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
reduce.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
reduce.h	Reductions update (#1351 )	2024-11-04 22:25:16 -08:00
resident.cpp	Generalize gpu backend (#2138 )	2025-04-30 09:08:17 -07:00
resident.h	Wired (#1510 )	2024-10-25 09:35:33 -07:00
rope.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
scaled_dot_product_attention.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
scan.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
slicing.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
softmax.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
sort.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
ternary.cpp	fix bw for elementwise ops (#2151 )	2025-05-05 06:15:04 -07:00
ternary.h	Add some internal GPU apis (#1177 )	2024-06-04 09:24:26 -07:00
unary.cpp	fix bw for elementwise ops (#2151 )	2025-05-05 06:15:04 -07:00
unary.h	Add some internal GPU apis (#1177 )	2024-06-04 09:24:26 -07:00
utils.cpp	Fp64 on the CPU (#1843 )	2025-02-07 15:52:22 -08:00
utils.h	fix bw for elementwise ops (#2151 )	2025-05-05 06:15:04 -07:00