Commit Graph

7 Commits

Author SHA1 Message Date
Jagrit Digani
d571366250 Update headdim 128 tuning 2024-11-22 10:12:20 -08:00
Jagrit Digani
791f50d9f3 Update benchmark and switch off 128 headdim 2024-11-22 10:12:20 -08:00
Jagrit Digani
0c22440c75 Update sdpa_benchmarks 2024-11-22 10:12:20 -08:00
Jagrit Digani
c9ab537b9a Update sdpa_benchmarks 2024-11-22 10:12:20 -08:00
Jagrit Digani
f1d87a2d3e Update sdpa_benchmarks 2024-11-22 10:12:20 -08:00
Nikhil Mehta
0b7d71fd2f
Add softmin, hardshrink, hardtanh (#1180)
---------

Co-authored-by: Nikhil Mehta <nikmehta@tesla.com>
2024-06-04 15:48:18 -07:00
Brian Keene
1865299a30
Metal shaders for memory efficient self attention on large sequences (#964)
* Metal shaders for efficient self attention on large sequences

Updated fast attention: GEMM-ified with Steel primitives
Uses flash attention 1 for scale correction

* more compiler silencing

* Address rebase issues

* Templatize kernel instantiation, revise cpu bindings

* Safer writes to output

* Permit batch size > 1

* Numerical fixes for sdpa self attention

* Re-enable test, remove unused variable

* add benchmarking script

* Disable sdpa prior to perf tuning, and simplify tests for per-patch CI
2024-06-03 09:16:19 -07:00