1514 Commits

Author SHA1 Message Date
Awni Hannun
2bc2b00a93 docs update 2025-10-17 19:14:55 +00:00
Awni Hannun
d8f3a8a834 docs update 2025-10-17 19:14:55 +00:00
Awni Hannun
3ee92b5d61 docs update 2025-10-17 19:14:55 +00:00
Awni Hannun
b8691a1637 remove uneeded files in docs 2025-10-17 19:14:55 +00:00
Awni Hannun
cfb54b960a update docs 2025-10-17 19:14:55 +00:00
Awni Hannun
03a66f24b0 docs update 2025-10-17 19:14:54 +00:00
Awni Hannun
cc06c8bc0e docs up 2025-10-17 19:14:54 +00:00
Awni Hannun
f5cfadc3d7 docs up 2025-10-17 19:14:54 +00:00
Awni Hannun
bd5f469bac docs update 2025-10-17 19:14:54 +00:00
Awni Hannun
6f71a74c87 docs 2025-10-17 19:14:54 +00:00
Awni Hannun
20a3e22ff0 docs 2025-10-17 19:14:54 +00:00
Awni Hannun
0f04ebb557 update docs 2025-10-17 19:14:54 +00:00
Awni Hannun
f6ae46f713 docs 2025-10-17 19:14:54 +00:00
Awni Hannun
eb64c60144 docs 2025-10-17 19:14:54 +00:00
Awni Hannun
86bd60c849 docs 2025-10-17 19:14:54 +00:00
Awni Hannun
7a235ce49e docs 2025-10-17 19:14:54 +00:00
Awni Hannun
13e8d87a88 docs 2025-10-17 19:14:54 +00:00
Awni Hannun
e09a97e24e docs 2025-10-17 19:14:54 +00:00
Awni Hannun
aa0e5e9049 docs 2025-10-17 19:14:54 +00:00
Awni Hannun
1c70552af7 docs 2025-10-17 19:14:53 +00:00
Awni Hannun
a372a3844d docs 2025-10-17 19:14:53 +00:00
Awni Hannun
4bce5f9b2d suppress gcc 10.1 warnings (#2679)
* suppress gcc 10.1 warnings

* suppress gcc 10.1 warnings
v0.29.3
2025-10-17 12:09:21 -07:00
Anastasiia Filippova
e9eab527eb Nccl timeout (#2673)
* print the error & delete nccl group

* timeout for nccl binding

* typo

* revert error

* fixed a typo
2025-10-14 12:29:54 -07:00
Awni Hannun
36ca62dba8 remove unused unary file (#2672) 2025-10-13 19:36:26 -07:00
Manuel Villanueva
9cbb1b0148 Modified sort behavior when running CPU or Metal to match NumPy/JAX (#2667)
* Modified sort behavior when running CPU or Metal to match NumPy/JAX sorting behavior.

* Modified sort behavior when running CPU or Metal to match NumPy/JAX

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2025-10-13 14:36:45 -07:00
Fabrizio Milo
9bfc476d72 Normalize README bullet formatting (#2671) 2025-10-13 12:13:30 -07:00
Awni Hannun
25e2356316 speed up scalars (#2669) 2025-10-13 12:10:15 -07:00
Awni Hannun
226a1d24e0 Debug cuda conv (#2662)
* use t4

* use t4
2025-10-10 16:12:47 -07:00
Awni Hannun
630350ad3e Precise sigmoid (#2659)
* bump patch

* Sigmoid matches PyTorch and is more precise on tails
2025-10-10 10:05:23 -07:00
Awni Hannun
380aeb58ae enable admm low-precision cpu (#2661) 2025-10-10 09:50:54 -07:00
Awni Hannun
f37389d100 bump patch (#2658) 2025-10-10 08:36:41 -07:00
Awni Hannun
e89e8b4272 Export with callback (#2612)
* export with callback

* export with callback

* Add types, fix kwarg ordering bug + test

* cleanup, test, fix

* typos
2025-10-08 19:24:33 -07:00
AN Long
85a8824a8c Fix cumulative operations when axis=None (#2653) 2025-10-08 15:25:38 -07:00
Awni Hannun
f5d4397e5c Fix fast synch when fence is waited before a command buffer is created (#2657) 2025-10-08 11:23:46 -07:00
Awni Hannun
343e33b6d5 fix all_gather vjp (#2654) 2025-10-07 06:05:23 -07:00
Angelos Katharopoulos
0073096dd1 Split name into directories for cuda jit (#2656) 2025-10-07 01:52:58 -07:00
Angelos Katharopoulos
e3d004fed9 Fix and refactor row-reduce (#2650) 2025-10-07 01:51:08 -07:00
Awni Hannun
a393435d28 Speed up compile for node with many parents (#2649) 2025-10-03 19:30:36 -07:00
Awni Hannun
a7a94b29d7 Fix compile when outputs change (#2648) 2025-10-03 08:40:57 -07:00
Daniel Yeh
22a5da76c8 Faster complex matmul (#2571) 2025-10-02 23:33:15 -07:00
Andrey Portnoy
287c63a093 Configure CMake to export compile_commands.json (#2645)
This helps enable LSP for code navigation using clangd.
2025-10-02 15:40:32 -07:00
Awni Hannun
1c9ae1eaa1 cuda fix flaky test (#2646) 2025-10-02 15:40:04 -07:00
Angelos Katharopoulos
c2c3e0b0a2 [CUDA] Add a small column specialization to reduce (#2642) 2025-10-02 14:41:05 -07:00
Awni Hannun
b0cc71ae71 Faster triu, tril, where with scalar (#2644) 2025-10-02 12:21:27 -07:00
Awni Hannun
e88f2d4a8e fix cross entropy axis param (#2641)
* fix cross entropy axis param

* faster grad clipping
2025-10-01 16:49:55 -07:00
Angelos Katharopoulos
9cee557423 Fix status message (#2638) 2025-10-01 16:43:45 -07:00
Awni Hannun
bbf1423953 wait for tasks in cuda (#2636) 2025-09-30 16:08:46 -07:00
Angelos Katharopoulos
eb24267b56 Compile now can attach arbitrary data to an entry (#2634) 2025-09-30 13:33:27 -07:00
Awni Hannun
dc371ae7a5 fix for max block dim (#2631) 2025-09-29 08:59:25 -07:00
AN Long
e76a8dd5c5 Fix incorrect path and typos (#2630) 2025-09-28 06:03:04 -07:00