Awni Hannun
6120a5f376
Faster DSv2/3 expert score computation ( #1257 )
...
* fix deepseek sharding (#1242 )
* compile and use put along axis in deep seek routing function
2025-02-07 10:24:57 -08:00
Awni Hannun
21d0ab6e8a
fix deepseek sharding ( #1242 )
2025-02-03 16:59:50 -08:00
Awni Hannun
9c2ef38d4d
only download local shard ( #1240 )
2025-02-02 13:58:44 -08:00
Awni Hannun
e8afb59de4
better overflow correction ( #1229 )
2025-01-28 14:37:30 -08:00
Awni Hannun
9a3ddc3e65
some fixes for pipeline parallel deep seek r1 ( #1216 )
2025-01-21 19:40:29 -08:00
Awni Hannun
5cae0a60e6
deepseek v3 model with pipeline parallelism ( #1191 )
...
* deepseekv3
* use upload_large_file instead of deprecated multi comit
* add pipeline generation and example
* comment
* get fp16 working
* use mlx==0.22
2025-01-09 15:55:53 -08:00