.. |
fft
|
Fix fft for integer overflow (#2161)
|
2025-05-09 14:25:12 -07:00 |
jit
|
Dispatch bf16 at run time when using the JIT (#1584)
|
2024-11-15 16:54:36 -08:00 |
metal_3_0
|
Dispatch bf16 at run time when using the JIT (#1584)
|
2024-11-15 16:54:36 -08:00 |
metal_3_1
|
Dispatch bf16 at run time when using the JIT (#1584)
|
2024-11-15 16:54:36 -08:00 |
reduction
|
Align mlx::core::min op nan propagation with NumPy (#2346)
|
2025-07-10 06:20:43 -07:00 |
steel
|
MoE backward improvements (#2335)
|
2025-07-07 17:59:53 -07:00 |
arange.h
|
More jitting (#1132)
|
2024-05-23 16:23:44 -07:00 |
arange.metal
|
Custom logsumexp (#2028)
|
2025-03-31 07:36:55 -07:00 |
arg_reduce.metal
|
fix large arg reduce (#2206)
|
2025-05-19 13:10:44 -07:00 |
atomic.h
|
Refactor reductions and fix scatter atomics for large sizes (#1300)
|
2024-08-22 16:03:31 -07:00 |
bf16_math.h
|
Dispatch bf16 at run time when using the JIT (#1584)
|
2024-11-15 16:54:36 -08:00 |
binary_ops.h
|
Fix complex power and print (#2286)
|
2025-06-13 11:13:00 -07:00 |
binary_two.h
|
Improve metal elementwise kernels (#2247)
|
2025-06-06 11:37:40 -07:00 |
binary_two.metal
|
Improve metal elementwise kernels (#2247)
|
2025-06-06 11:37:40 -07:00 |
binary.h
|
Improve metal elementwise kernels (#2247)
|
2025-06-06 11:37:40 -07:00 |
binary.metal
|
Improve metal elementwise kernels (#2247)
|
2025-06-06 11:37:40 -07:00 |
cexpf.h
|
[CUDA] Implement Scan kernel (#2347)
|
2025-07-10 16:54:12 -07:00 |
CMakeLists.txt
|
MoE backward improvements (#2335)
|
2025-07-07 17:59:53 -07:00 |
complex.h
|
Add more complex unary ops (#2101)
|
2025-04-21 13:04:54 -07:00 |
conv.metal
|
Depthwise Conv2D optimization (#2036)
|
2025-04-03 09:42:04 -07:00 |
copy.h
|
Improve metal elementwise kernels (#2247)
|
2025-06-06 11:37:40 -07:00 |
copy.metal
|
Improve metal elementwise kernels (#2247)
|
2025-06-06 11:37:40 -07:00 |
defines.h
|
Refactor reductions and fix scatter atomics for large sizes (#1300)
|
2024-08-22 16:03:31 -07:00 |
erf.h
|
JIT compile option for binary minimization (#1091)
|
2024-05-22 12:57:13 -07:00 |
expm1f.h
|
Fix overflow / underflow handling for expm1f (#1278)
|
2024-07-23 07:29:06 -07:00 |
fence.metal
|
Faster synchronization Fence primitive (#1773)
|
2025-01-17 18:42:19 -08:00 |
fft.h
|
fix fft bug (#2062)
|
2025-04-10 19:41:27 -07:00 |
fft.metal
|
Add Quantized Ops to the JIT (#1204)
|
2024-06-12 09:47:12 -07:00 |
gather_axis.h
|
scatter axis + gather axis primitives (#1813)
|
2025-01-31 20:48:08 -08:00 |
gather.h
|
Use int64 stride everywhere (#1671)
|
2024-12-09 11:09:02 -08:00 |
gemv_masked.h
|
Use same accumulation precision in gemv as gemm (#1962)
|
2025-03-16 07:13:24 -07:00 |
gemv_masked.metal
|
Use int64 stride everywhere (#1671)
|
2024-12-09 11:09:02 -08:00 |
gemv.metal
|
enable complex gemm (#2017)
|
2025-03-28 10:45:13 -07:00 |
hadamard.h
|
GPU Hadamard for large N (#1879)
|
2025-05-01 17:19:17 -07:00 |
indexing.h
|
Use int64 stride everywhere (#1671)
|
2024-12-09 11:09:02 -08:00 |
layer_norm.metal
|
Fix layernorm race condition (#2340)
|
2025-07-07 06:06:01 -07:00 |
logsumexp.h
|
Fix out-of-bounds default value in logsumexp/softmax (#2213)
|
2025-05-21 07:25:16 -07:00 |
logsumexp.metal
|
Custom logsumexp (#2028)
|
2025-03-31 07:36:55 -07:00 |
quantized.h
|
Fix edge check in qmm_n QuantizedLoader (#2355)
|
2025-07-10 16:28:50 -07:00 |
quantized.metal
|
5bit quants (#2226)
|
2025-05-30 12:12:10 -07:00 |
random.metal
|
Use int64 stride everywhere (#1671)
|
2024-12-09 11:09:02 -08:00 |
reduce_utils.h
|
More jitting (#1132)
|
2024-05-23 16:23:44 -07:00 |
reduce.h
|
Fix JIT reductions (#1373)
|
2024-08-28 16:39:11 -07:00 |
reduce.metal
|
Fix reduce sum/prod overflow (#2477)
|
2025-08-12 00:05:33 -07:00 |
rms_norm.metal
|
Custom logsumexp (#2028)
|
2025-03-31 07:36:55 -07:00 |
rope.metal
|
Dispatch bf16 at run time when using the JIT (#1584)
|
2024-11-15 16:54:36 -08:00 |
scaled_dot_product_attention.metal
|
Add float mask to sdpa vector (#2068)
|
2025-04-11 17:29:40 -07:00 |
scan.h
|
LogCumSumExp (#2069)
|
2025-04-13 01:27:29 -07:00 |
scan.metal
|
Complex scan (#2094)
|
2025-04-22 18:56:28 -07:00 |
scatter_axis.h
|
scatter axis + gather axis primitives (#1813)
|
2025-01-31 20:48:08 -08:00 |
scatter.h
|
Use int64 stride everywhere (#1671)
|
2024-12-09 11:09:02 -08:00 |
sdpa_vector.h
|
fix batched vector sdpa (#2152)
|
2025-05-05 13:13:03 -07:00 |
softmax.h
|
Fix out-of-bounds default value in logsumexp/softmax (#2213)
|
2025-05-21 07:25:16 -07:00 |
softmax.metal
|
Custom logsumexp (#2028)
|
2025-03-31 07:36:55 -07:00 |
sort.h
|
faster sort (#1831)
|
2025-02-05 06:10:22 -08:00 |
sort.metal
|
faster sort (#1831)
|
2025-02-05 06:10:22 -08:00 |
ternary_ops.h
|
JIT compile option for binary minimization (#1091)
|
2024-05-22 12:57:13 -07:00 |
ternary.h
|
Improve metal elementwise kernels (#2247)
|
2025-06-06 11:37:40 -07:00 |
ternary.metal
|
Improve metal elementwise kernels (#2247)
|
2025-06-06 11:37:40 -07:00 |
unary_ops.h
|
[CUDA] Implement Scan kernel (#2347)
|
2025-07-10 16:54:12 -07:00 |
unary.h
|
Improve metal elementwise kernels (#2247)
|
2025-06-06 11:37:40 -07:00 |
unary.metal
|
Improve metal elementwise kernels (#2247)
|
2025-06-06 11:37:40 -07:00 |
utils.h
|
fix bw for elementwise ops (#2151)
|
2025-05-05 06:15:04 -07:00 |