mlx/python at bc2a29f03355b137b6197e0810557522f7bb3cd2 - mlx - Gitea for Geophysics

zhangyiss/mlx

mirror of https://github.com/ml-explore/mlx.git synced 2025-09-17 09:18:12 +08:00

Files

History

Jagrit Digani 02bec0bb6d Matrix Attention kernel (#1610 )

* Rough INIT

* [WIP]: Loading and Matmuls added

* [WIP]: Reductions and min working aligned kernel at headdim = 64

* [WIP] Added headdim 80 for testing

* [WIP] Update dispatch params for testing

* [WIP] Add support for unaligned seq lengths - still looks messy

* Update sdpa_benchmarks

* Update sdpa_benchmarks

* Update sdpa_benchmarks

* Enable gqa support

* Update benchmark and switch off 128 headdim

* Update headdim 128 tuning

* Remove older fast attention code. Write out O strided

* Disable hd=128 until further optimizations

* Enable bf16

* Fix data size bug

* Enable attn build outside of jit

2024-11-22 10:34:05 -08:00

..

Update GEMM (#424 )

2024-01-17 12:42:39 -08:00

Reductions update (#1351 )

2024-11-04 22:25:16 -08:00

batch_matmul_bench.py

Add isort pre-commit and run (#68 )

2023-12-08 11:31:47 -08:00

compile_bench.py

Add softmin, hardshrink, hardtanh (#1180 )

2024-06-04 15:48:18 -07:00

conv1d_bench.py

Add groups to Conv1d (#948 )

2024-04-27 06:24:57 -07:00

conv2d_bench_cpu.py

Conv cpu improvements (#1410 )

2024-09-15 18:45:10 -07:00

conv2d_train_bench_cpu.py

Conv cpu improvements (#1410 )

2024-09-15 18:45:10 -07:00

conv2d_transpose_bench_cpu.py

Conv cpu improvements (#1410 )

2024-09-15 18:45:10 -07:00

conv3d_bench_cpu.py

Conv cpu improvements (#1410 )

2024-09-15 18:45:10 -07:00

conv3d_train_bench_cpu.py

Conv cpu improvements (#1410 )

2024-09-15 18:45:10 -07:00

conv3d_transpose_bench_cpu.py

Conv cpu improvements (#1410 )

2024-09-15 18:45:10 -07:00

conv_bench.py

Add softmin, hardshrink, hardtanh (#1180 )

2024-06-04 15:48:18 -07:00

conv_transpose_bench.py

Transposed Convolution (#1245 )

2024-09-06 19:52:38 -07:00

distributed_bench.py

MPI ops in GPU stream for faster comms (#1356 )

2024-08-26 15:12:50 -07:00

einsum_bench.py

Einsum (#1269 )

2024-07-25 09:36:44 -07:00

fft_bench.py

Feature complete Metal FFT (#1102 )

2024-06-06 12:57:25 -07:00

gather_bench.py

Scatter optimization : Eliminate 64b integer divide. (#662 )

2024-02-10 08:49:51 -08:00

hadamard_bench.py

Fast Hadamard Transform (#1249 )

2024-07-09 20:39:01 -07:00

layer_norm_bench.py

Implement vjps for some primitives in the fast namespace (#883 )

2024-03-26 16:35:34 -07:00

rms_norm_bench.py

Implement vjps for some primitives in the fast namespace (#883 )

2024-03-26 16:35:34 -07:00

rope_bench.py

Fix copy donation and add partial rope (#881 )

2024-03-22 17:28:26 -07:00

scatter_bench.py

improvements to scatter / gather (#1541 )

2024-10-30 19:30:54 -07:00

sdpa_bench.py

Matrix Attention kernel (#1610 )

2024-11-22 10:34:05 -08:00

sdpa_vector_bench.py

2-Pass Sdpa Inference Kernel (#1597 )

2024-11-18 17:31:53 -08:00

single_ops.py

Propagate nans in binary ops (#579 )

2024-01-29 11:19:38 -08:00

time_utils.py

Shapeless compilation for some graphs (#687 )

2024-02-19 21:43:54 -08:00