mlx/mlx/backend/metal
Awni Hannun f1951d6cce
Use fewer barriers (#1561)
* use fewer barriers

* comment
2024-11-04 10:26:49 -08:00
..
jit improvements to scatter / gather (#1541) 2024-10-30 19:30:54 -07:00
kernels Sdpa fix (#1558) 2024-11-02 21:25:46 -07:00
allocator.cpp Wired (#1510) 2024-10-25 09:35:33 -07:00
allocator.h Wired (#1510) 2024-10-25 09:35:33 -07:00
binary.cpp Fix thread group for large arrays (#1543) 2024-10-30 16:25:12 -07:00
binary.h Fixes for large arrays with a few ops (#1299) 2024-07-30 17:18:39 -07:00
CMakeLists.txt improvements to scatter / gather (#1541) 2024-10-30 19:30:54 -07:00
compiled.cpp Fix thread group for large arrays (#1543) 2024-10-30 16:25:12 -07:00
conv.cpp change wino dispatch conditoin (#1534) 2024-10-28 11:13:44 -07:00
copy.cpp Fix thread group for large arrays (#1543) 2024-10-30 16:25:12 -07:00
copy.h Fix copying scalars by adding fill_gpu (#1402) 2024-09-09 15:54:08 -07:00
custom_kernel.cpp Remove use of vector<const T> (#1514) 2024-10-22 16:31:52 -07:00
device.cpp Use fewer barriers (#1561) 2024-11-04 10:26:49 -08:00
device.h Use fewer barriers (#1561) 2024-11-04 10:26:49 -08:00
distributed.cpp Adds send/recv ops in distributed (#1366) 2024-08-26 23:01:37 -07:00
event.cpp Fix array is_available race cases (#1468) 2024-10-07 19:13:50 -07:00
fft.cpp Remove Hazard tracking with Fences (#1509) 2024-10-21 19:33:32 -07:00
hadamard.cpp Remove Hazard tracking with Fences (#1509) 2024-10-21 19:33:32 -07:00
indexing.cpp improvements to scatter / gather (#1541) 2024-10-30 19:30:54 -07:00
jit_kernels.cpp Working 64-bit scans (#1506) 2024-10-24 11:05:46 -07:00
kernels.h C++20 compatibility for fmt (#1519) 2024-10-24 08:54:51 -07:00
make_compiled_preamble.sh fix compiling with space in paths (#1332) 2024-08-15 16:39:24 -07:00
matmul.cpp Gemm update (#1518) 2024-10-30 19:30:28 -07:00
matmul.h Wired (#1510) 2024-10-25 09:35:33 -07:00
metal_impl.h Add synchronize function (#1006) 2024-04-22 08:25:46 -07:00
metal.cpp Fix array is_available race cases (#1468) 2024-10-07 19:13:50 -07:00
metal.h Wired (#1510) 2024-10-25 09:35:33 -07:00
nojit_kernels.cpp Real and Imag (#1490) 2024-10-15 16:23:15 -07:00
normalization.cpp Remove Hazard tracking with Fences (#1509) 2024-10-21 19:33:32 -07:00
primitives.cpp Faster bits and bernoulli (#1535) 2024-10-28 11:11:00 -07:00
quantized.cpp Remove Hazard tracking with Fences (#1509) 2024-10-21 19:33:32 -07:00
reduce.cpp Remove Hazard tracking with Fences (#1509) 2024-10-21 19:33:32 -07:00
reduce.h Further reduction tuning (#1349) 2024-08-23 10:35:25 -07:00
resident.cpp Skip using Residency sets in VMs (#1537) 2024-10-29 19:37:23 -07:00
resident.h Wired (#1510) 2024-10-25 09:35:33 -07:00
rope.cpp Xcode 160 (#1384) 2024-09-10 15:15:17 -07:00
scaled_dot_product_attention.cpp Sdpa fix (#1558) 2024-11-02 21:25:46 -07:00
scan.cpp Working 64-bit scans (#1506) 2024-10-24 11:05:46 -07:00
slicing.cpp Fix copying scalars by adding fill_gpu (#1402) 2024-09-09 15:54:08 -07:00
slicing.h Fix slice data size (#1394) 2024-09-04 19:10:43 -07:00
softmax.cpp Remove Hazard tracking with Fences (#1509) 2024-10-21 19:33:32 -07:00
sort.cpp Remove Hazard tracking with Fences (#1509) 2024-10-21 19:33:32 -07:00
ternary.cpp Fix thread group for large arrays (#1543) 2024-10-30 16:25:12 -07:00
ternary.h Add some internal GPU apis (#1177) 2024-06-04 09:24:26 -07:00
unary.cpp Fix thread group for large arrays (#1543) 2024-10-30 16:25:12 -07:00
unary.h Add some internal GPU apis (#1177) 2024-06-04 09:24:26 -07:00
utils.cpp Fix thread group for large arrays (#1543) 2024-10-30 16:25:12 -07:00
utils.h Working 64-bit scans (#1506) 2024-10-24 11:05:46 -07:00