mlx/mlx/backend
Jagrit Digani 9adcd1a650
Support fused masking in Attention (#1924)
* Update API to allow mask='causal' in fast::sdpa

* Add fallback

* Update steel::AttnParams

* Fix typo

* WIP, basic causal

* Update tests

* Update benchmarking

* Update masking loop limits

* Add bool masking and update tests

* Update additive mask

* Update benchmarks

* Update benchmarks

* Update tests

* Update for bfloat error

* Update early exit

* Add random seed to tests
2025-03-20 11:01:32 -07:00
..
common redesign for faster cpu/gpu synch (#1869) 2025-03-06 19:23:38 -08:00
cpu Fix multistream GPU deadlock (#1969) 2025-03-20 07:19:47 -07:00
metal Support fused masking in Attention (#1924) 2025-03-20 11:01:32 -07:00
no_cpu redesign for faster cpu/gpu synch (#1869) 2025-03-06 19:23:38 -08:00
no_metal Guard nullptr dereference (#1972) 2025-03-19 16:24:10 -07:00