mlx/mlx/backend/metal at 0751263dec5a210eb2ba097c108e8d78aa58124c - mlx

jit

Gather mm new kernel and small refactoring (#2040 )

2025-04-14 16:37:36 -07:00

kernels

Fix typo in row_reduce_small (#2179 )

2025-05-13 20:19:54 -07:00

allocator.cpp

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

allocator.h

wire cache (#2006 )

2025-03-25 18:54:01 -07:00

binary.cpp

fix bw for elementwise ops (#2151 )

2025-05-05 06:15:04 -07:00

binary.h

Fixes for large arrays with a few ops (#1299 )

2024-07-30 17:18:39 -07:00

CMakeLists.txt

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

compiled.cpp

fix bw for elementwise ops (#2151 )

2025-05-05 06:15:04 -07:00

conv.cpp

fix: conv_general differences between gpu, cpu (#2070 )

2025-05-09 10:26:52 -07:00

copy.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

custom_kernel.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

device.cpp

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

device.h

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

distributed.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

eval.cpp

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

event.cpp

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

fence.cpp

fix input coherent kernel launch (#2153 )

2025-05-05 17:30:50 -07:00

fft.cpp

Fix fft for integer overflow (#2161 )

2025-05-09 14:25:12 -07:00

hadamard.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

indexing.cpp

Add remove_index utility (#2173 )

2025-05-13 17:09:56 -07:00

jit_kernels.cpp

Gather qmm batched kernel and refactoring of quantized (#2078 )

2025-04-17 13:53:11 -07:00

kernels.h

Gather qmm batched kernel and refactoring of quantized (#2078 )

2025-04-17 13:53:11 -07:00

logsumexp.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

make_compiled_preamble.sh

Dispatch bf16 at run time when using the JIT (#1584 )

2024-11-15 16:54:36 -08:00

matmul.cpp

Close a couple edge case bugs: hadamard and addmm on empty inputs (#2177 )

2025-05-12 10:48:57 -07:00

matmul.h

Use int64 stride everywhere (#1671 )

2024-12-09 11:09:02 -08:00

metal.cpp

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

metal.h

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

no_metal.cpp

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

nojit_kernels.cpp

Gather qmm batched kernel and refactoring of quantized (#2078 )

2025-04-17 13:53:11 -07:00

normalization.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

primitives.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

quantized.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

reduce.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

reduce.h

Reductions update (#1351 )

2024-11-04 22:25:16 -08:00

resident.cpp

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

resident.h

Wired (#1510 )

2024-10-25 09:35:33 -07:00

rope.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

scaled_dot_product_attention.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

scan.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

slicing.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

softmax.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

sort.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

ternary.cpp

fix bw for elementwise ops (#2151 )

2025-05-05 06:15:04 -07:00

ternary.h

Add some internal GPU apis (#1177 )

2024-06-04 09:24:26 -07:00

unary.cpp

fix bw for elementwise ops (#2151 )

2025-05-05 06:15:04 -07:00

unary.h

Add some internal GPU apis (#1177 )

2024-06-04 09:24:26 -07:00

utils.cpp

Fp64 on the CPU (#1843 )

2025-02-07 15:52:22 -08:00

utils.h

fix bw for elementwise ops (#2151 )

2025-05-05 06:15:04 -07:00