.. |
gemms
|
add a half simd gemm fallback (#2046)
|
2025-04-07 09:31:29 -07:00 |
simd
|
Complex scan (#2094)
|
2025-04-22 18:56:28 -07:00 |
arange.h
|
redesign for faster cpu/gpu synch (#1869)
|
2025-03-06 19:23:38 -08:00 |
arg_reduce.cpp
|
Add remove_index utility (#2173)
|
2025-05-13 17:09:56 -07:00 |
available.cpp
|
Generalize gpu backend (#2138)
|
2025-04-30 09:08:17 -07:00 |
available.h
|
Generalize gpu backend (#2138)
|
2025-04-30 09:08:17 -07:00 |
binary_ops.h
|
Refactor common into cpu specific and truly common (#1817)
|
2025-02-03 15:58:02 -08:00 |
binary_two.h
|
reduce binary size (#1952)
|
2025-03-11 06:30:44 -07:00 |
binary.cpp
|
Complex scan (#2094)
|
2025-04-22 18:56:28 -07:00 |
binary.h
|
reduce binary size (#1952)
|
2025-03-11 06:30:44 -07:00 |
cholesky.cpp
|
redesign for faster cpu/gpu synch (#1869)
|
2025-03-06 19:23:38 -08:00 |
CMakeLists.txt
|
non-symmetric eig and eigh (#2188)
|
2025-05-15 13:01:44 -07:00 |
compiled_preamble.h
|
Refactor common into cpu specific and truly common (#1817)
|
2025-02-03 15:58:02 -08:00 |
compiled.cpp
|
Share more common code in Compiled (#2240)
|
2025-06-03 16:48:50 -07:00 |
conv.cpp
|
fix: conv_general differences between gpu, cpu (#2070)
|
2025-05-09 10:26:52 -07:00 |
copy.cpp
|
reduce binary size (#1952)
|
2025-03-11 06:30:44 -07:00 |
copy.h
|
redesign for faster cpu/gpu synch (#1869)
|
2025-03-06 19:23:38 -08:00 |
distributed.cpp
|
Fix the test and add custom min/max reductions for uncommon MPI types (#2060)
|
2025-04-10 17:01:17 -07:00 |
eig.cpp
|
non-symmetric eig and eigh (#2188)
|
2025-05-15 13:01:44 -07:00 |
eigh.cpp
|
Add complex eigh (#2191)
|
2025-05-18 00:18:43 -07:00 |
encoder.cpp
|
redesign for faster cpu/gpu synch (#1869)
|
2025-03-06 19:23:38 -08:00 |
encoder.h
|
Fix multistream GPU deadlock (#1969)
|
2025-03-20 07:19:47 -07:00 |
eval.cpp
|
Fix multistream GPU deadlock (#1969)
|
2025-03-20 07:19:47 -07:00 |
eval.h
|
redesign for faster cpu/gpu synch (#1869)
|
2025-03-06 19:23:38 -08:00 |
fft.cpp
|
fix malloc or wait deadlock (#1976)
|
2025-03-20 16:48:43 -07:00 |
gemm.h
|
redesign for faster cpu/gpu synch (#1869)
|
2025-03-06 19:23:38 -08:00 |
hadamard.cpp
|
redesign for faster cpu/gpu synch (#1869)
|
2025-03-06 19:23:38 -08:00 |
indexing.cpp
|
Add remove_index utility (#2173)
|
2025-05-13 17:09:56 -07:00 |
inverse.cpp
|
fix malloc or wait deadlock (#1976)
|
2025-03-20 16:48:43 -07:00 |
jit_compiler.cpp
|
Fix compilation error on Windows (#1844)
|
2025-02-10 19:53:05 -08:00 |
jit_compiler.h
|
Refactor common into cpu specific and truly common (#1817)
|
2025-02-03 15:58:02 -08:00 |
lapack.h
|
Add complex eigh (#2191)
|
2025-05-18 00:18:43 -07:00 |
logsumexp.cpp
|
Custom logsumexp (#2028)
|
2025-03-31 07:36:55 -07:00 |
luf.cpp
|
fix malloc or wait deadlock (#1976)
|
2025-03-20 16:48:43 -07:00 |
make_compiled_preamble.ps1
|
Refactor common into cpu specific and truly common (#1817)
|
2025-02-03 15:58:02 -08:00 |
make_compiled_preamble.sh
|
fix cpu compile (#1897)
|
2025-02-24 14:10:30 -08:00 |
masked_mm.cpp
|
fix malloc or wait deadlock (#1976)
|
2025-03-20 16:48:43 -07:00 |
matmul.cpp
|
Close a couple edge case bugs: hadamard and addmm on empty inputs (#2177)
|
2025-05-12 10:48:57 -07:00 |
primitives.cpp
|
Distributed layers (#1270)
|
2025-03-21 13:52:17 -07:00 |
qrf.cpp
|
fix malloc or wait deadlock (#1976)
|
2025-03-20 16:48:43 -07:00 |
quantized.cpp
|
5bit quants (#2226)
|
2025-05-30 12:12:10 -07:00 |
reduce.cpp
|
fix malloc or wait deadlock (#1976)
|
2025-03-20 16:48:43 -07:00 |
scan.cpp
|
Complex scan (#2094)
|
2025-04-22 18:56:28 -07:00 |
select.cpp
|
reduce binary size (#1952)
|
2025-03-11 06:30:44 -07:00 |
slicing.h
|
Refactor common into cpu specific and truly common (#1817)
|
2025-02-03 15:58:02 -08:00 |
softmax.cpp
|
Custom logsumexp (#2028)
|
2025-03-31 07:36:55 -07:00 |
sort.cpp
|
fix malloc or wait deadlock (#1976)
|
2025-03-20 16:48:43 -07:00 |
svd.cpp
|
fix malloc or wait deadlock (#1976)
|
2025-03-20 16:48:43 -07:00 |
ternary.h
|
reduce binary size (#1952)
|
2025-03-11 06:30:44 -07:00 |
threefry.cpp
|
Refactor common into cpu specific and truly common (#1817)
|
2025-02-03 15:58:02 -08:00 |
threefry.h
|
Refactor common into cpu specific and truly common (#1817)
|
2025-02-03 15:58:02 -08:00 |
unary_ops.h
|
Fix CPU sign for unsigned ints (#2024)
|
2025-03-30 17:56:59 -07:00 |
unary.cpp
|
Fix MSVC build due to use of M_LN2 (#2058)
|
2025-04-10 07:41:41 -07:00 |
unary.h
|
CUDA backend: unary ops (#2158)
|
2025-06-09 06:45:08 -07:00 |