Awni Hannun
|
e843c4d8d5
|
fix power (#2523)
|
2025-08-21 06:46:01 -07:00 |
|
Awni Hannun
|
6441c21a94
|
Faster general unary op (#2472)
* faster general unary op
* faster general ops + reorg
* fix + comment
* binary two
* copy general
|
2025-08-15 15:04:12 -07:00 |
|
Awni Hannun
|
d32519c8ee
|
fix gemv regression (#2445)
|
2025-07-30 14:23:01 -07:00 |
|
Cheng
|
3628e5d497
|
Use load_vector in arg_reduce (#2439)
|
2025-07-30 17:40:26 +09:00 |
|
Cheng
|
a0ae49d397
|
Move arange to its own file (#2438)
|
2025-07-30 13:05:51 +09:00 |
|
Awni Hannun
|
ef631d63af
|
faster rms norm (#2433)
|
2025-07-29 13:12:00 -07:00 |
|
Awni Hannun
|
641be9463b
|
Add more CUDA architectures for PyPi package (#2427)
* add cuda sm 90
* add more archs
|
2025-07-28 12:35:15 -07:00 |
|
Awni Hannun
|
d107d8d495
|
add cuda gemv (#2400)
|
2025-07-22 08:24:13 -07:00 |
|
Cheng
|
f55c4ed1d6
|
Remove thrust iterators (#2396)
|
2025-07-21 07:30:27 -07:00 |
|
Awni Hannun
|
d7734edd9f
|
fix complex reduce + nan propagation in min and max (#2377)
|
2025-07-15 18:19:47 -07:00 |
|
Cheng
|
cb349a291c
|
[CUDA] Use cuda::std::complex in place of cuComplex (#2372)
|
2025-07-15 00:36:13 -07:00 |
|
Cheng
|
6325f60d52
|
[CUDA] Bundle CCCL for JIT compilation (#2357)
* Ship CCCL for JIT compilation
* Remove cexpf
|
2025-07-11 18:45:37 -07:00 |
|
Cheng
|
8347575ba1
|
[CUDA] Implement Scan kernel (#2347)
* Contiguous scan
* Strided scan
* Enable tests
* Fix failing logaddexp test
* Use cexpf in Metal
|
2025-07-10 16:54:12 -07:00 |
|
Cheng
|
2ca533b279
|
Fix compilation with CUDA 11 (#2331)
|
2025-07-07 20:00:43 -07:00 |
|
Cheng
|
9d10239af7
|
[CUDA] Do vectorized store/load in binary ops (#2330)
|
2025-07-07 08:44:14 -07:00 |
|
Awni Hannun
|
dd4f53db63
|
use fp32 for testing, add more complex ops (#2322)
|
2025-07-01 07:30:00 -07:00 |
|
Awni Hannun
|
c9a9180584
|
Cuda perf tuning (#2307)
* perf tuning
* fix adding inputs arrays in matmul / srot
* format
* fix
|
2025-06-20 14:50:57 -07:00 |
|
Awni Hannun
|
b8022c578a
|
divmod, partition, sort fixes (#2302)
|
2025-06-16 18:49:32 -07:00 |
|
Awni Hannun
|
bc53f8293f
|
Cuda bug fixes 2 (#2298)
* more bug fixes
* more bug fixes
* format
|
2025-06-16 13:14:46 -07:00 |
|
Awni Hannun
|
c552ff2451
|
[CUDA] Fix back-end bugs and enable corresponding tests (#2296)
* Fix some cuda back-end bugs and enable corresponding tests
* more fixes
* enable more tests
* format
|
2025-06-16 08:45:40 -07:00 |
|
Awni Hannun
|
8402a2acf4
|
Fix complex power and print (#2286)
* fix complex power and print
* fix complex matmul shape
|
2025-06-13 11:13:00 -07:00 |
|
Cheng
|
c8b4787e4e
|
CUDA backend: indexing ops (#2277)
|
2025-06-12 21:44:19 -07:00 |
|
Awni Hannun
|
2188199ff8
|
[CUDA] ternary with select op (#2283)
* cuda ternary with select op
* comment + fix
* fix
|
2025-06-12 20:24:43 -07:00 |
|
Cheng
|
a4fc671d3e
|
CUDA backend: compile (#2276)
* CUDA backend: compile
* Rename kernels/ to device/
|
2025-06-12 17:08:39 -07:00 |
|