mlx/mlx/backend/metal/kernels at 13b26775f1b0f37c565a6d55791e552f7d99102d - mlx

fft

Feature complete Metal FFT (#1102 )

2024-06-06 12:57:25 -07:00

jit

Dispatch bf16 at run time when using the JIT (#1584 )

2024-11-15 16:54:36 -08:00

metal_3_0

Dispatch bf16 at run time when using the JIT (#1584 )

2024-11-15 16:54:36 -08:00

metal_3_1

Dispatch bf16 at run time when using the JIT (#1584 )

2024-11-15 16:54:36 -08:00

reduction

Use int64 stride everywhere (#1671 )

2024-12-09 11:09:02 -08:00

steel

Fix looping limit in causal attention (#1999 )

2025-03-24 12:28:00 -07:00

arange.h

More jitting (#1132 )

2024-05-23 16:23:44 -07:00

arange.metal

Dispatch bf16 at run time when using the JIT (#1584 )

2024-11-15 16:54:36 -08:00

arg_reduce.metal

Use int64 stride everywhere (#1671 )

2024-12-09 11:09:02 -08:00

atomic.h

Refactor reductions and fix scatter atomics for large sizes (#1300 )

2024-08-22 16:03:31 -07:00

bf16_math.h

Dispatch bf16 at run time when using the JIT (#1584 )

2024-11-15 16:54:36 -08:00

binary_ops.h

Fix complex power on Metal (#1460 )

2024-10-06 19:58:30 -07:00

binary_two.h

Use int64 stride everywhere (#1671 )

2024-12-09 11:09:02 -08:00

binary_two.metal

Allow no copy negative strides in as_strided and slice (#1688 )

2024-12-12 08:59:45 -08:00

binary.h

Use int64 stride everywhere (#1671 )

2024-12-09 11:09:02 -08:00

binary.metal

Allow no copy negative strides in as_strided and slice (#1688 )

2024-12-12 08:59:45 -08:00

CMakeLists.txt

use minimum deployment target (#2016 )

2025-03-28 14:31:53 -07:00

complex.h

Refactor reductions and fix scatter atomics for large sizes (#1300 )

2024-08-22 16:03:31 -07:00

conv.metal

Fix convs by reverting #1803 (#1882 )

2025-02-18 14:36:34 -08:00

copy.h

Dynamic slicing (#1741 )

2025-01-07 14:02:16 -08:00

copy.metal

Dynamic slicing (#1741 )

2025-01-07 14:02:16 -08:00

defines.h

Refactor reductions and fix scatter atomics for large sizes (#1300 )

2024-08-22 16:03:31 -07:00

erf.h

JIT compile option for binary minimization (#1091 )

2024-05-22 12:57:13 -07:00

expm1f.h

Fix overflow / underflow handling for expm1f (#1278 )

2024-07-23 07:29:06 -07:00

fence.metal

Faster synchronization Fence primitive (#1773 )

2025-01-17 18:42:19 -08:00

fft.h

Feature complete Metal FFT (#1102 )

2024-06-06 12:57:25 -07:00

fft.metal

Add Quantized Ops to the JIT (#1204 )

2024-06-12 09:47:12 -07:00

gather_axis.h

scatter axis + gather axis primitives (#1813 )

2025-01-31 20:48:08 -08:00

gather.h

Use int64 stride everywhere (#1671 )

2024-12-09 11:09:02 -08:00

gemv_masked.h

Use same accumulation precision in gemv as gemm (#1962 )

2025-03-16 07:13:24 -07:00

gemv_masked.metal

Use int64 stride everywhere (#1671 )

2024-12-09 11:09:02 -08:00

gemv.metal

enable complex gemm (#2017 )

2025-03-28 10:45:13 -07:00

hadamard.h

Fix bfloat16 Hadamard (#1283 )

2024-07-23 14:54:43 -07:00

indexing.h

Use int64 stride everywhere (#1671 )

2024-12-09 11:09:02 -08:00

layer_norm.metal

RMS norm without scaling (#1915 )

2025-02-28 20:26:57 -08:00

quantized.h

Affine quant always in fp32 (#1925 )

2025-03-04 17:50:19 -08:00

quantized.metal

3 and 6 bit quantization (#1613 )

2024-11-22 10:22:13 -08:00

random.metal

Use int64 stride everywhere (#1671 )

2024-12-09 11:09:02 -08:00

reduce_utils.h

More jitting (#1132 )

2024-05-23 16:23:44 -07:00

reduce.h

Fix JIT reductions (#1373 )

2024-08-28 16:39:11 -07:00

reduce.metal

Allow no copy negative strides in as_strided and slice (#1688 )

2024-12-12 08:59:45 -08:00

rms_norm.metal

RMS norm without scaling (#1915 )

2025-02-28 20:26:57 -08:00

rope.metal

Dispatch bf16 at run time when using the JIT (#1584 )

2024-11-15 16:54:36 -08:00

scaled_dot_product_attention.metal

sdpa specialization for head dim 256 (#2007 )

2025-03-27 19:31:25 -07:00

scan.h

Working 64-bit scans (#1506 )

2024-10-24 11:05:46 -07:00

scan.metal

Working 64-bit scans (#1506 )

2024-10-24 11:05:46 -07:00

scatter_axis.h

scatter axis + gather axis primitives (#1813 )

2025-01-31 20:48:08 -08:00

scatter.h

Use int64 stride everywhere (#1671 )

2024-12-09 11:09:02 -08:00

sdpa_vector.h

causal vector sdpa (#2018 )

2025-03-28 12:36:13 -07:00

softmax.h

consistently handle all -inf in softmax (#1470 )

2024-10-08 09:54:02 -07:00

softmax.metal

Dispatch bf16 at run time when using the JIT (#1584 )

2024-11-15 16:54:36 -08:00

sort.h

faster sort (#1831 )

2025-02-05 06:10:22 -08:00

sort.metal

faster sort (#1831 )

2025-02-05 06:10:22 -08:00

ternary_ops.h

JIT compile option for binary minimization (#1091 )

2024-05-22 12:57:13 -07:00

ternary.h

Use int64 stride everywhere (#1671 )

2024-12-09 11:09:02 -08:00

ternary.metal

Fix nd ternary on GPU (#1746 )

2025-01-03 11:52:17 -08:00

unary_ops.h

Bitwise Inverse (#1862 )

2025-02-13 08:44:14 -08:00

unary.h

Use int64 stride everywhere (#1671 )

2024-12-09 11:09:02 -08:00

unary.metal

Bitwise Inverse (#1862 )

2025-02-13 08:44:14 -08:00

utils.h

Allow no copy negative strides in as_strided and slice (#1688 )

2024-12-12 08:59:45 -08:00