Daniel Yeh
|
22a5da76c8
|
Faster complex matmul (#2571)
|
2025-10-02 23:33:15 -07:00 |
|
Cheng
|
6a3acf2301
|
[CUDA] Set bias as input when using bias epilogue (#2584)
|
2025-09-11 15:31:09 +09:00 |
|
Cheng
|
44cc5da4bc
|
[CUDA] Fix alpha not respected when using bias epilogue (#2578)
|
2025-09-10 09:08:01 +09:00 |
|
Cheng
|
dde3682b69
|
[CUDA] Use GEMM with epilogue instead of AddMM (#2569)
|
2025-09-09 13:18:49 +09:00 |
|
Cheng
|
ac85ddfdb7
|
[CUDA] Add GEMM-based fallback convolution kernels (#2511)
* Add gemm_conv
* Add gemm_grouped_conv
|
2025-08-20 10:06:22 +09:00 |
|
Cheng
|
dfb5022eab
|
Rename cu::Matmul to CublasGemm (#2488)
|
2025-08-13 09:37:40 +09:00 |
|
Awni Hannun
|
7bb96e4249
|
fix cublas on h100 (#2466)
|
2025-08-06 06:18:58 -07:00 |
|
Awni Hannun
|
9acec364c2
|
[CUDA] Always use batched matmul (#2404)
* cuda batched mm
* addmm as well
* comment
|
2025-07-24 20:46:02 -07:00 |
|