mlx/metal at a4a4b46b8d4137be5f439e7fe242b4bfed1543e5 - mlx

Awni Hannun a4a4b46b8d fix jit	2025-06-06 11:08:22 -07:00
..
jit	Gather mm new kernel and small refactoring (#2040 )	2025-04-14 16:37:36 -07:00
kernels	compile and copy	2025-06-06 10:43:52 -07:00
allocator.cpp	Add memory cache to CUDA backend (#2221 )	2025-05-30 12:12:54 -07:00
allocator.h	Add memory cache to CUDA backend (#2221 )	2025-05-30 12:12:54 -07:00
binary.cpp	improve metal elementwise kernels	2025-06-06 10:43:52 -07:00
binary.h	Fixes for large arrays with a few ops (#1299 )	2024-07-30 17:18:39 -07:00
CMakeLists.txt	Generalize gpu backend (#2138 )	2025-04-30 09:08:17 -07:00
compiled.cpp	compile and copy	2025-06-06 10:43:52 -07:00
conv.cpp	fix conv2d bug + faster conv 1d (#2195 )	2025-05-18 06:05:11 -07:00
copy.cpp	compile and copy	2025-06-06 10:43:52 -07:00
custom_kernel.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
device.cpp	Generalize gpu backend (#2138 )	2025-04-30 09:08:17 -07:00
device.h	Add set_threadgroup_memory_length to CommandEncoder (#2183 )	2025-05-16 00:28:03 -07:00
distributed.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
eval.cpp	Generalize gpu backend (#2138 )	2025-04-30 09:08:17 -07:00
event.cpp	Generalize gpu backend (#2138 )	2025-04-30 09:08:17 -07:00
fence.cpp	fix input coherent kernel launch (#2153 )	2025-05-05 17:30:50 -07:00
fft.cpp	Fix fft for integer overflow (#2161 )	2025-05-09 14:25:12 -07:00
hadamard.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
indexing.cpp	Add remove_index utility (#2173 )	2025-05-13 17:09:56 -07:00
jit_kernels.cpp	fix jit	2025-06-06 11:08:22 -07:00
kernels.h	Gather qmm batched kernel and refactoring of quantized (#2078 )	2025-04-17 13:53:11 -07:00
logsumexp.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
make_compiled_preamble.sh	Dispatch bf16 at run time when using the JIT (#1584 )	2024-11-15 16:54:36 -08:00
matmul.cpp	Close a couple edge case bugs: hadamard and addmm on empty inputs (#2177 )	2025-05-12 10:48:57 -07:00
matmul.h	Use int64 stride everywhere (#1671 )	2024-12-09 11:09:02 -08:00
metal.cpp	Generalize gpu backend (#2138 )	2025-04-30 09:08:17 -07:00
metal.h	Generalize gpu backend (#2138 )	2025-04-30 09:08:17 -07:00
no_metal.cpp	Generalize gpu backend (#2138 )	2025-04-30 09:08:17 -07:00
nojit_kernels.cpp	Gather qmm batched kernel and refactoring of quantized (#2078 )	2025-04-17 13:53:11 -07:00
normalization.cpp	Fast primitives decide when to use the fallback (#2216 )	2025-06-02 13:26:37 -07:00
primitives.cpp	fix large arg reduce (#2206 )	2025-05-19 13:10:44 -07:00
quantized.cpp	5bit quants (#2226 )	2025-05-30 12:12:10 -07:00
reduce.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
reduce.h	Reductions update (#1351 )	2024-11-04 22:25:16 -08:00
resident.cpp	Generalize gpu backend (#2138 )	2025-04-30 09:08:17 -07:00
resident.h	Wired (#1510 )	2024-10-25 09:35:33 -07:00
rope.cpp	Fast primitives decide when to use the fallback (#2216 )	2025-06-02 13:26:37 -07:00
scaled_dot_product_attention.cpp	Fast primitives decide when to use the fallback (#2216 )	2025-06-02 13:26:37 -07:00
scan.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
slicing.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
softmax.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
sort.cpp	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
ternary.cpp	improve metal elementwise kernels	2025-06-06 10:43:52 -07:00
ternary.h	Add some internal GPU apis (#1177 )	2024-06-04 09:24:26 -07:00
unary.cpp	improve metal elementwise kernels	2025-06-06 10:43:52 -07:00
unary.h	Add some internal GPU apis (#1177 )	2024-06-04 09:24:26 -07:00
utils.cpp	Move some dims utils to common (#2223 )	2025-05-29 06:48:30 -07:00
utils.h	improve metal elementwise kernels	2025-06-06 10:43:52 -07:00

jit

Gather mm new kernel and small refactoring (#2040 )

2025-04-14 16:37:36 -07:00

kernels

compile and copy

2025-06-06 10:43:52 -07:00

allocator.cpp

Add memory cache to CUDA backend (#2221 )

2025-05-30 12:12:54 -07:00

allocator.h

Add memory cache to CUDA backend (#2221 )

2025-05-30 12:12:54 -07:00

binary.cpp

improve metal elementwise kernels

2025-06-06 10:43:52 -07:00

binary.h

Fixes for large arrays with a few ops (#1299 )

2024-07-30 17:18:39 -07:00

CMakeLists.txt

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

compiled.cpp

compile and copy

2025-06-06 10:43:52 -07:00

conv.cpp

fix conv2d bug + faster conv 1d (#2195 )

2025-05-18 06:05:11 -07:00

copy.cpp

compile and copy

2025-06-06 10:43:52 -07:00

custom_kernel.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

device.cpp

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

device.h

Add set_threadgroup_memory_length to CommandEncoder (#2183 )

2025-05-16 00:28:03 -07:00

distributed.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

eval.cpp

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

event.cpp

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

fence.cpp

fix input coherent kernel launch (#2153 )

2025-05-05 17:30:50 -07:00

fft.cpp

Fix fft for integer overflow (#2161 )

2025-05-09 14:25:12 -07:00

hadamard.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

indexing.cpp

Add remove_index utility (#2173 )

2025-05-13 17:09:56 -07:00

jit_kernels.cpp

fix jit

2025-06-06 11:08:22 -07:00

kernels.h

Gather qmm batched kernel and refactoring of quantized (#2078 )

2025-04-17 13:53:11 -07:00

logsumexp.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

make_compiled_preamble.sh

Dispatch bf16 at run time when using the JIT (#1584 )

2024-11-15 16:54:36 -08:00

matmul.cpp

Close a couple edge case bugs: hadamard and addmm on empty inputs (#2177 )

2025-05-12 10:48:57 -07:00

matmul.h

Use int64 stride everywhere (#1671 )

2024-12-09 11:09:02 -08:00

metal.cpp

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

metal.h

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

no_metal.cpp

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

nojit_kernels.cpp

Gather qmm batched kernel and refactoring of quantized (#2078 )

2025-04-17 13:53:11 -07:00

normalization.cpp

Fast primitives decide when to use the fallback (#2216 )

2025-06-02 13:26:37 -07:00

primitives.cpp

fix large arg reduce (#2206 )

2025-05-19 13:10:44 -07:00

quantized.cpp

5bit quants (#2226 )

2025-05-30 12:12:10 -07:00

reduce.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

reduce.h

Reductions update (#1351 )

2024-11-04 22:25:16 -08:00

resident.cpp

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

resident.h

Wired (#1510 )

2024-10-25 09:35:33 -07:00

rope.cpp

Fast primitives decide when to use the fallback (#2216 )

2025-06-02 13:26:37 -07:00

scaled_dot_product_attention.cpp

Fast primitives decide when to use the fallback (#2216 )

2025-06-02 13:26:37 -07:00

scan.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

slicing.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

softmax.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

sort.cpp

Move common gpu primitives to backend/gpu (#2145 )

2025-05-05 13:45:29 -07:00

ternary.cpp

improve metal elementwise kernels

2025-06-06 10:43:52 -07:00

ternary.h

Add some internal GPU apis (#1177 )

2024-06-04 09:24:26 -07:00

unary.cpp

improve metal elementwise kernels

2025-06-06 10:43:52 -07:00

unary.h

Add some internal GPU apis (#1177 )

2024-06-04 09:24:26 -07:00

utils.cpp

Move some dims utils to common (#2223 )

2025-05-29 06:48:30 -07:00

utils.h

improve metal elementwise kernels

2025-06-06 10:43:52 -07:00