Files
mlx/python/tests
Awni Hannun a54f06b16f Fast RMS Norm (#862)
* fast rmsnorm

* no rms gpu

* kernel

* fix shared mem

* looped rms and donation in softmax

* Make the squaring in float32 to avoid underflow

* Fix the default StreamOrDevice for rope and rms_norm in fast

* nits

---------

Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-03-21 07:20:54 -07:00
..
2024-01-26 16:30:33 -08:00
2024-03-18 20:12:25 -07:00
2024-03-12 13:13:41 -07:00
2024-02-28 20:11:16 -08:00
2024-02-07 17:29:22 -08:00
2024-03-04 23:02:27 -08:00
2024-03-21 07:20:54 -07:00
2024-02-14 14:14:58 -08:00
2024-01-08 16:39:08 -08:00
2024-01-30 13:11:01 -08:00
2024-02-25 08:39:55 -08:00
2024-03-01 19:51:58 -08:00
2024-03-12 08:54:06 -07:00
2024-02-12 18:54:21 -08:00
2024-03-14 14:38:22 -07:00