Awni Hannun
b8691a1637
remove uneeded files in docs
2025-10-17 19:14:55 +00:00
Awni Hannun
cfb54b960a
update docs
2025-10-17 19:14:55 +00:00
Awni Hannun
03a66f24b0
docs update
2025-10-17 19:14:54 +00:00
Awni Hannun
cc06c8bc0e
docs up
2025-10-17 19:14:54 +00:00
Awni Hannun
f5cfadc3d7
docs up
2025-10-17 19:14:54 +00:00
Awni Hannun
bd5f469bac
docs update
2025-10-17 19:14:54 +00:00
Awni Hannun
6f71a74c87
docs
2025-10-17 19:14:54 +00:00
Awni Hannun
20a3e22ff0
docs
2025-10-17 19:14:54 +00:00
Awni Hannun
0f04ebb557
update docs
2025-10-17 19:14:54 +00:00
Awni Hannun
f6ae46f713
docs
2025-10-17 19:14:54 +00:00
Awni Hannun
eb64c60144
docs
2025-10-17 19:14:54 +00:00
Awni Hannun
86bd60c849
docs
2025-10-17 19:14:54 +00:00
Awni Hannun
7a235ce49e
docs
2025-10-17 19:14:54 +00:00
Awni Hannun
13e8d87a88
docs
2025-10-17 19:14:54 +00:00
Awni Hannun
e09a97e24e
docs
2025-10-17 19:14:54 +00:00
Awni Hannun
aa0e5e9049
docs
2025-10-17 19:14:54 +00:00
Awni Hannun
1c70552af7
docs
2025-10-17 19:14:53 +00:00
Awni Hannun
a372a3844d
docs
2025-10-17 19:14:53 +00:00
Awni Hannun
4bce5f9b2d
suppress gcc 10.1 warnings ( #2679 )
...
* suppress gcc 10.1 warnings
* suppress gcc 10.1 warnings
v0.29.3
2025-10-17 12:09:21 -07:00
Anastasiia Filippova
e9eab527eb
Nccl timeout ( #2673 )
...
* print the error & delete nccl group
* timeout for nccl binding
* typo
* revert error
* fixed a typo
2025-10-14 12:29:54 -07:00
Awni Hannun
36ca62dba8
remove unused unary file ( #2672 )
2025-10-13 19:36:26 -07:00
Manuel Villanueva
9cbb1b0148
Modified sort behavior when running CPU or Metal to match NumPy/JAX ( #2667 )
...
* Modified sort behavior when running CPU or Metal to match NumPy/JAX sorting behavior.
* Modified sort behavior when running CPU or Metal to match NumPy/JAX
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-10-13 14:36:45 -07:00
Fabrizio Milo
9bfc476d72
Normalize README bullet formatting ( #2671 )
2025-10-13 12:13:30 -07:00
Awni Hannun
25e2356316
speed up scalars ( #2669 )
2025-10-13 12:10:15 -07:00
Awni Hannun
226a1d24e0
Debug cuda conv ( #2662 )
...
* use t4
* use t4
2025-10-10 16:12:47 -07:00
Awni Hannun
630350ad3e
Precise sigmoid ( #2659 )
...
* bump patch
* Sigmoid matches PyTorch and is more precise on tails
2025-10-10 10:05:23 -07:00
Awni Hannun
380aeb58ae
enable admm low-precision cpu ( #2661 )
2025-10-10 09:50:54 -07:00
Awni Hannun
f37389d100
bump patch ( #2658 )
2025-10-10 08:36:41 -07:00
Awni Hannun
e89e8b4272
Export with callback ( #2612 )
...
* export with callback
* export with callback
* Add types, fix kwarg ordering bug + test
* cleanup, test, fix
* typos
2025-10-08 19:24:33 -07:00
AN Long
85a8824a8c
Fix cumulative operations when axis=None ( #2653 )
2025-10-08 15:25:38 -07:00
Awni Hannun
f5d4397e5c
Fix fast synch when fence is waited before a command buffer is created ( #2657 )
2025-10-08 11:23:46 -07:00
Awni Hannun
343e33b6d5
fix all_gather vjp ( #2654 )
2025-10-07 06:05:23 -07:00
Angelos Katharopoulos
0073096dd1
Split name into directories for cuda jit ( #2656 )
2025-10-07 01:52:58 -07:00
Angelos Katharopoulos
e3d004fed9
Fix and refactor row-reduce ( #2650 )
2025-10-07 01:51:08 -07:00
Awni Hannun
a393435d28
Speed up compile for node with many parents ( #2649 )
2025-10-03 19:30:36 -07:00
Awni Hannun
a7a94b29d7
Fix compile when outputs change ( #2648 )
2025-10-03 08:40:57 -07:00
Daniel Yeh
22a5da76c8
Faster complex matmul ( #2571 )
2025-10-02 23:33:15 -07:00
Andrey Portnoy
287c63a093
Configure CMake to export compile_commands.json ( #2645 )
...
This helps enable LSP for code navigation using clangd.
2025-10-02 15:40:32 -07:00
Awni Hannun
1c9ae1eaa1
cuda fix flaky test ( #2646 )
2025-10-02 15:40:04 -07:00
Angelos Katharopoulos
c2c3e0b0a2
[CUDA] Add a small column specialization to reduce ( #2642 )
2025-10-02 14:41:05 -07:00
Awni Hannun
b0cc71ae71
Faster triu, tril, where with scalar ( #2644 )
2025-10-02 12:21:27 -07:00
Awni Hannun
e88f2d4a8e
fix cross entropy axis param ( #2641 )
...
* fix cross entropy axis param
* faster grad clipping
2025-10-01 16:49:55 -07:00
Angelos Katharopoulos
9cee557423
Fix status message ( #2638 )
2025-10-01 16:43:45 -07:00
Awni Hannun
bbf1423953
wait for tasks in cuda ( #2636 )
2025-09-30 16:08:46 -07:00
Angelos Katharopoulos
eb24267b56
Compile now can attach arbitrary data to an entry ( #2634 )
2025-09-30 13:33:27 -07:00
Awni Hannun
dc371ae7a5
fix for max block dim ( #2631 )
2025-09-29 08:59:25 -07:00
AN Long
e76a8dd5c5
Fix incorrect path and typos ( #2630 )
2025-09-28 06:03:04 -07:00
Cheng
b466dea982
[CUDA] Make CudaEvent work with multi-device ( #2614 )
...
* Set current device when creating cuda event
* Separate cuda events by device
* Avoid race condition in pool
2025-09-27 11:27:17 +09:00
Angelos Katharopoulos
7a6adda1e6
Bump the version ( #2627 )
v0.29.2
2025-09-26 15:15:28 -07:00
Angelos Katharopoulos
1a9f820af6
Compiled should not end in broadcast ( #2622 )
2025-09-26 13:36:09 -07:00