Files
mlx/python/tests
Brian Keene 1865299a30 Metal shaders for memory efficient self attention on large sequences (#964)
* Metal shaders for efficient self attention on large sequences

Updated fast attention: GEMM-ified with Steel primitives
Uses flash attention 1 for scale correction

* more compiler silencing

* Address rebase issues

* Templatize kernel instantiation, revise cpu bindings

* Safer writes to output

* Permit batch size > 1

* Numerical fixes for sdpa self attention

* Re-enable test, remove unused variable

* add benchmarking script

* Disable sdpa prior to perf tuning, and simplify tests for per-patch CI
2024-06-03 09:16:19 -07:00
..
2024-05-23 17:04:02 -07:00
2024-05-28 15:18:18 -07:00
2024-04-11 21:15:36 -07:00
2024-01-08 16:39:08 -08:00
2024-01-30 13:11:01 -08:00
2024-02-25 08:39:55 -08:00
2024-05-03 17:12:51 -07:00
2024-05-22 07:48:34 -07:00
2024-04-22 11:17:49 -07:00