Default Branch

ec72b44417 · Add quantize/dequantize for mxfp8 and nvfp4 (#2688) · Updated 2025-10-29 07:23:12 +08:00

Branches

21453281a4 · add pool threshold · Updated 2025-10-28 07:26:57 +08:00

8
2

7cfd0da856 · rebase · Updated 2025-10-18 03:16:27 +08:00

14
71

f8b6f8a3dc · add test · Updated 2025-10-16 22:41:22 +08:00

15
2

4987e7615a · Improve the cutlass gemm · Updated 2025-08-26 09:18:19 +08:00

108
8

400f8457ea · Experimenting with a gemm based on the cuda steel utils · Updated 2025-08-15 02:27:50 +08:00

126
1

a22d0bf273 · Add stricter condition to matrix sdpa · Updated 2025-08-07 10:51:14 +08:00

136
8
qmm

8269c9d02d · Support unaligned M · Updated 2025-07-23 15:40:27 +08:00

188
6

a9c720e8cd · Improve the ring backend initialization · Updated 2025-07-12 06:31:28 +08:00

212
1

870208eff5 · Start sdpa vector · Updated 2025-06-17 08:38:39 +08:00

247
1
fft

83762691ba · Fix four step fft · Updated 2025-05-09 05:14:59 +08:00    zhangyiss

325
6

7c99acb799 · split logsumexp · Updated 2025-05-07 08:10:14 +08:00    zhangyiss

326
1

998404ada4 · Get trellis to run · Updated 2025-04-26 22:02:20 +08:00    zhangyiss

365
3

11f73d6e89 · Double buffer keys for vector sdpa · Updated 2025-04-22 15:19:11 +08:00    zhangyiss

354
1

4c46e17a5d · Update benchmark output · Updated 2025-04-16 01:50:06 +08:00    zhangyiss

364
1

67ec27d515 · synch before reading memory in test · Updated 2025-04-08 05:37:32 +08:00    zhangyiss

378
4

066336b60e · load q4_k from gguf · Updated 2025-04-04 01:56:12 +08:00    zhangyiss

388
1

688e421184 · only interrupt during an eval · Updated 2025-03-19 22:56:26 +08:00    zhangyiss

435
2

127de8821e · Fix the sig_handler check · Updated 2025-03-08 09:31:06 +08:00    zhangyiss

449
2

c5073fc452 · Ensure we only have one copy of the fence · Updated 2025-03-05 15:37:15 +08:00    zhangyiss

458
3

4c1dfa58b7 · xor op on arrays (#1875) · Updated 2025-02-17 16:24:53 +08:00    zhangyiss

485
0
Included