mlx/python
Awni Hannun a54f06b16f
Fast RMS Norm (#862)
* fast rmsnorm

* no rms gpu

* kernel

* fix shared mem

* looped rms and donation in softmax

* Make the squaring in float32 to avoid underflow

* Fix the default StreamOrDevice for rope and rms_norm in fast

* nits

---------

Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-03-21 07:20:54 -07:00
..
mlx Fast RMS Norm (#862) 2024-03-21 07:20:54 -07:00
src Fast RMS Norm (#862) 2024-03-21 07:20:54 -07:00
tests Fast RMS Norm (#862) 2024-03-21 07:20:54 -07:00