Awni Hannun
eea1a5f6ec
update docs
2025-08-07 07:46:30 +00:00
Awni Hannun
1c7ebac536
docs update
2025-08-07 07:46:30 +00:00
Awni Hannun
c07de29458
docs up
2025-08-07 07:46:30 +00:00
Awni Hannun
c0dc02b8f4
docs up
2025-08-07 07:46:30 +00:00
Awni Hannun
e9e7ff09dd
docs update
2025-08-07 07:46:30 +00:00
Awni Hannun
b024ed7744
docs
2025-08-07 07:46:30 +00:00
Awni Hannun
beda33e1a9
docs
2025-08-07 07:46:30 +00:00
Awni Hannun
83d006dfdd
update docs
2025-08-07 07:46:30 +00:00
Awni Hannun
b906475af3
docs
2025-08-07 07:46:30 +00:00
Awni Hannun
0bb6edbe54
docs
2025-08-07 07:46:30 +00:00
Awni Hannun
b728f1fd66
docs
2025-08-07 07:46:30 +00:00
Awni Hannun
3bf234ff88
docs
2025-08-07 07:46:30 +00:00
Awni Hannun
082e939fc7
docs
2025-08-07 07:46:30 +00:00
Awni Hannun
75b55ffa19
docs
2025-08-07 07:46:29 +00:00
Awni Hannun
b8cf6bd778
docs
2025-08-07 07:46:29 +00:00
Awni Hannun
3e96be74e6
docs
2025-08-07 07:46:29 +00:00
Awni Hannun
312037d0f7
docs
2025-08-07 07:46:29 +00:00
Awni Hannun
56be773610
version ( #2470 )
2025-08-07 00:36:04 -07:00
Jagrit Digani
a9bdd67baa
Add CUDA sdpa vector ( #2468 )
2025-08-06 21:40:26 -07:00
Angelos Katharopoulos
f2adb5638d
Fix typo in metal command encoder ( #2471 )
2025-08-06 16:58:23 -07:00
Luca Vivona
728d4db582
Support destination arg in tree flatten/unflatten ( #2450 )
2025-08-06 15:34:59 -07:00
Awni Hannun
db5c7efcf6
revert default cuda install ( #2465 )
...
* revert default cuda install
* revert default cuda install
2025-08-06 06:19:12 -07:00
Awni Hannun
7bb96e4249
fix cublas on h100 ( #2466 )
2025-08-06 06:18:58 -07:00
Awni Hannun
fa89f0b150
faster gather qmm sorted test ( #2463 )
2025-08-05 06:27:40 -07:00
Awni Hannun
ca973d1e83
fix install tags ( #2464 )
2025-08-04 20:01:23 -07:00
Cheng
828c5f1137
Use SmallVector for shapes and strides ( #2454 )
...
* Use SmallVector for shapes and strides
* Convert SmallVector to tuple
2025-08-05 09:41:03 +09:00
Gaétan Lepage
7d86a5c108
Feat: add USE_SYSTEM_FMT CMake option ( #2219 )
2025-08-04 16:36:11 -07:00
Awni Hannun
0b807893a7
fix wraps compile ( #2461 )
2025-08-04 16:14:18 -07:00
Awni Hannun
6ad0889c8a
default install cuda on linux ( #2462 )
2025-08-04 15:33:05 -07:00
Zamderax
737dd6d1ac
Add missing <algorithm> header to jit_compiler.cpp ( #2460 )
...
Fixes compilation error on Linux where std::find_if is used on line 121
but the <algorithm> header was not included. While this might work on
some platforms due to transitive includes, it's not guaranteed by the
C++ standard.
Resolves issue #2459
2025-08-04 14:00:46 -07:00
Cheng
aaf78f4c6b
Use LRU cache for cuda graph ( #2448 )
...
* Use LRU cache for cuda graph
* Remove unused destructor
2025-08-02 21:28:57 +09:00
Angelos Katharopoulos
8831064493
Fix arctan2 grads ( #2453 )
2025-08-01 21:06:04 -07:00
Angelos Katharopoulos
be9bc96da4
[CUDA] Matmul utils initial commit ( #2441 )
2025-08-01 14:22:25 -07:00
Angelos Katharopoulos
86258f292f
[CUDA] Vectorize generated kernels ( #2444 )
2025-07-31 18:18:57 -07:00
Cheng
b26d88591c
[CUDA] Save primitive inputs faster ( #2449 )
...
* Add more nvtx loggings
* [CUDA] Saving primitive inputs faster
* Remove unneeded check
2025-08-01 10:16:06 +09:00
Cheng
86c6a15571
[CUDA] Backward convolution ( #2431 )
2025-08-01 09:54:05 +09:00
junpeiz
8b25ce62d5
Add tests for export including control flow models and quantized models ( #2430 )
...
* Add tests for export, including control flow export and quantized model export.
* Skip quantization related test for CUDA backend.
2025-07-31 11:06:26 -07:00
Awni Hannun
da5912e4f2
fix custom metal extension ( #2446 )
2025-07-31 06:25:36 -07:00
Cheng
daafee676f
Fix wrong graph key when using concurrent context ( #2447 )
2025-07-31 06:01:05 -07:00
Awni Hannun
d32519c8ee
fix gemv regression ( #2445 )
2025-07-30 14:23:01 -07:00
Awni Hannun
b405591249
fix circular reference ( #2443 )
2025-07-30 09:37:44 -07:00
Angelos Katharopoulos
3bf81ed1bd
[CUDA] Quantized refactoring ( #2442 )
2025-07-30 08:27:20 -07:00
Cheng
2204182bba
Make CI faster ( #2440 )
2025-07-30 02:26:36 -07:00
Cheng
3628e5d497
Use load_vector in arg_reduce ( #2439 )
2025-07-30 17:40:26 +09:00
Cheng
a0ae49d397
Move arange to its own file ( #2438 )
2025-07-30 13:05:51 +09:00
Cheng
254476718b
Remove the kernel arg from get_launch_args ( #2437 )
2025-07-30 11:43:02 +09:00
Awni Hannun
3adba92ebe
Cuda faster softmax ( #2435 )
...
* faster softmax and logsumexp
* faster softmax and logsumexp
* format
2025-07-29 17:18:12 -07:00
Awni Hannun
ef631d63af
faster rms norm ( #2433 )
2025-07-29 13:12:00 -07:00
Cheng
970dbe8e25
Use ccache in CI ( #2414 )
...
* Detect ccache
* Use ccache in CI
* Separate cache for different images
* Test both 12.2 and 12.9 for PRs
2025-07-29 08:43:22 +09:00
Awni Hannun
641be9463b
Add more CUDA architectures for PyPi package ( #2427 )
...
* add cuda sm 90
* add more archs
2025-07-28 12:35:15 -07:00