mlx/mlx/backend/metal/kernels
2025-08-12 00:05:33 -07:00
..
fft Fix fft for integer overflow (#2161) 2025-05-09 14:25:12 -07:00
jit Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
metal_3_0 Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
metal_3_1 Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
reduction Align mlx::core::min op nan propagation with NumPy (#2346) 2025-07-10 06:20:43 -07:00
steel MoE backward improvements (#2335) 2025-07-07 17:59:53 -07:00
arange.h More jitting (#1132) 2024-05-23 16:23:44 -07:00
arange.metal Custom logsumexp (#2028) 2025-03-31 07:36:55 -07:00
arg_reduce.metal fix large arg reduce (#2206) 2025-05-19 13:10:44 -07:00
atomic.h Refactor reductions and fix scatter atomics for large sizes (#1300) 2024-08-22 16:03:31 -07:00
bf16_math.h Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
binary_ops.h Fix complex power and print (#2286) 2025-06-13 11:13:00 -07:00
binary_two.h Improve metal elementwise kernels (#2247) 2025-06-06 11:37:40 -07:00
binary_two.metal Improve metal elementwise kernels (#2247) 2025-06-06 11:37:40 -07:00
binary.h Improve metal elementwise kernels (#2247) 2025-06-06 11:37:40 -07:00
binary.metal Improve metal elementwise kernels (#2247) 2025-06-06 11:37:40 -07:00
cexpf.h [CUDA] Implement Scan kernel (#2347) 2025-07-10 16:54:12 -07:00
CMakeLists.txt MoE backward improvements (#2335) 2025-07-07 17:59:53 -07:00
complex.h Add more complex unary ops (#2101) 2025-04-21 13:04:54 -07:00
conv.metal Depthwise Conv2D optimization (#2036) 2025-04-03 09:42:04 -07:00
copy.h Improve metal elementwise kernels (#2247) 2025-06-06 11:37:40 -07:00
copy.metal Improve metal elementwise kernels (#2247) 2025-06-06 11:37:40 -07:00
defines.h Refactor reductions and fix scatter atomics for large sizes (#1300) 2024-08-22 16:03:31 -07:00
erf.h JIT compile option for binary minimization (#1091) 2024-05-22 12:57:13 -07:00
expm1f.h Fix overflow / underflow handling for expm1f (#1278) 2024-07-23 07:29:06 -07:00
fence.metal Faster synchronization Fence primitive (#1773) 2025-01-17 18:42:19 -08:00
fft.h fix fft bug (#2062) 2025-04-10 19:41:27 -07:00
fft.metal Add Quantized Ops to the JIT (#1204) 2024-06-12 09:47:12 -07:00
gather_axis.h scatter axis + gather axis primitives (#1813) 2025-01-31 20:48:08 -08:00
gather.h Use int64 stride everywhere (#1671) 2024-12-09 11:09:02 -08:00
gemv_masked.h Use same accumulation precision in gemv as gemm (#1962) 2025-03-16 07:13:24 -07:00
gemv_masked.metal Use int64 stride everywhere (#1671) 2024-12-09 11:09:02 -08:00
gemv.metal enable complex gemm (#2017) 2025-03-28 10:45:13 -07:00
hadamard.h GPU Hadamard for large N (#1879) 2025-05-01 17:19:17 -07:00
indexing.h Use int64 stride everywhere (#1671) 2024-12-09 11:09:02 -08:00
layer_norm.metal Fix layernorm race condition (#2340) 2025-07-07 06:06:01 -07:00
logsumexp.h Fix out-of-bounds default value in logsumexp/softmax (#2213) 2025-05-21 07:25:16 -07:00
logsumexp.metal Custom logsumexp (#2028) 2025-03-31 07:36:55 -07:00
quantized.h Fix edge check in qmm_n QuantizedLoader (#2355) 2025-07-10 16:28:50 -07:00
quantized.metal 5bit quants (#2226) 2025-05-30 12:12:10 -07:00
random.metal Use int64 stride everywhere (#1671) 2024-12-09 11:09:02 -08:00
reduce_utils.h More jitting (#1132) 2024-05-23 16:23:44 -07:00
reduce.h Fix JIT reductions (#1373) 2024-08-28 16:39:11 -07:00
reduce.metal Fix reduce sum/prod overflow (#2477) 2025-08-12 00:05:33 -07:00
rms_norm.metal Custom logsumexp (#2028) 2025-03-31 07:36:55 -07:00
rope.metal Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
scaled_dot_product_attention.metal Add float mask to sdpa vector (#2068) 2025-04-11 17:29:40 -07:00
scan.h LogCumSumExp (#2069) 2025-04-13 01:27:29 -07:00
scan.metal Complex scan (#2094) 2025-04-22 18:56:28 -07:00
scatter_axis.h scatter axis + gather axis primitives (#1813) 2025-01-31 20:48:08 -08:00
scatter.h Use int64 stride everywhere (#1671) 2024-12-09 11:09:02 -08:00
sdpa_vector.h fix batched vector sdpa (#2152) 2025-05-05 13:13:03 -07:00
softmax.h Fix out-of-bounds default value in logsumexp/softmax (#2213) 2025-05-21 07:25:16 -07:00
softmax.metal Custom logsumexp (#2028) 2025-03-31 07:36:55 -07:00
sort.h faster sort (#1831) 2025-02-05 06:10:22 -08:00
sort.metal faster sort (#1831) 2025-02-05 06:10:22 -08:00
ternary_ops.h JIT compile option for binary minimization (#1091) 2024-05-22 12:57:13 -07:00
ternary.h Improve metal elementwise kernels (#2247) 2025-06-06 11:37:40 -07:00
ternary.metal Improve metal elementwise kernels (#2247) 2025-06-06 11:37:40 -07:00
unary_ops.h [CUDA] Implement Scan kernel (#2347) 2025-07-10 16:54:12 -07:00
unary.h Improve metal elementwise kernels (#2247) 2025-06-06 11:37:40 -07:00
unary.metal Improve metal elementwise kernels (#2247) 2025-06-06 11:37:40 -07:00
utils.h fix bw for elementwise ops (#2151) 2025-05-05 06:15:04 -07:00