Angelos Katharopoulos
|
2e8cf0b450
|
Change layernorms to two pass algorithm (#2246)
|
2025-06-06 13:34:56 -07:00 |
|
Cheng
|
24f89173d1
|
CUDA backend: matmul (#2241)
|
2025-06-06 12:24:04 -07:00 |
|
Awni Hannun
|
c6a20b427a
|
Improve metal elementwise kernels (#2247)
* improve metal elementwise kernels
* compile and copy
* fix jit
|
2025-06-06 11:37:40 -07:00 |
|
Cheng
|
0bb89e9e5f
|
Share more common code in Compiled (#2240)
* Share more common code in Compiled
* Remove build_lib_name
|
2025-06-03 16:48:50 -07:00 |
|
Cheng
|
1b021f6984
|
Fast primitives decide when to use the fallback (#2216)
|
2025-06-02 13:26:37 -07:00 |
|
Cheng
|
db5a7c6192
|
Add memory cache to CUDA backend (#2221)
* Move BufferCache out of allocator
* Add memory cache to cuda backend allocator
* Simplify BufferCache assuming buf can not be null
|
2025-05-30 12:12:54 -07:00 |
|
Awni Hannun
|
6ef2f67e7f
|
5bit quants (#2226)
* 5bit quants
* 5bit quants
|
2025-05-30 12:12:10 -07:00 |
|
Cheng
|
f76ee1ffd2
|
Move some dims utils to common (#2223)
|
2025-05-29 06:48:30 -07:00 |
|
Cheng
|
79071bfba4
|
Fix out-of-bounds default value in logsumexp/softmax (#2213)
|
2025-05-21 07:25:16 -07:00 |
|
Cheng
|
7774b87cbd
|
Remove redundant simd_sum in logsumexp (#2210)
|
2025-05-21 07:25:03 -07:00 |
|
Awni Hannun
|
eebe73001a
|
fix large arg reduce (#2206)
|
2025-05-19 13:10:44 -07:00 |
|
Awni Hannun
|
8576e6fe36
|
fix conv2d bug + faster conv 1d (#2195)
* fix conv2d bug + faster conv 1d
* revert sort + flaky test
|
2025-05-18 06:05:11 -07:00 |
|
Jack Wind
|
7ff5c41e06
|
Add set_threadgroup_memory_length to CommandEncoder (#2183)
|
2025-05-16 00:28:03 -07:00 |
|
Awni Hannun
|
c1eb9d05d9
|
non-symmetric eig and eigh (#2188)
|
2025-05-15 13:01:44 -07:00 |
|
Cheng
|
0751263dec
|
Fix typo in row_reduce_small (#2179)
|
2025-05-13 20:19:54 -07:00 |
|
Cheng
|
eca2f3eb97
|
Add remove_index utility (#2173)
|
2025-05-13 17:09:56 -07:00 |
|
Awni Hannun
|
8f3d208dce
|
Close a couple edge case bugs: hadamard and addmm on empty inputs (#2177)
* handle hadamard and addmm on empty inputs
* fix
|
2025-05-12 10:48:57 -07:00 |
|
Awni Hannun
|
6661387066
|
Fix fft for integer overflow (#2161)
|
2025-05-09 14:25:12 -07:00 |
|
ATurker
|
a7fae8a176
|
fix: conv_general differences between gpu, cpu (#2070)
* fix general_conv padding
* fix bugs
* add test
---------
Co-authored-by: Awni Hannun <awni@apple.com>
|
2025-05-09 10:26:52 -07:00 |
|
Awni Hannun
|
5a1a5d5ed1
|
fix input coherent kernel launch (#2153)
|
2025-05-05 17:30:50 -07:00 |
|
Cheng
|
1683975acf
|
Move common gpu primitives to backend/gpu (#2145)
|
2025-05-05 13:45:29 -07:00 |
|
Awni Hannun
|
af705590ac
|
fix batched vector sdpa (#2152)
|
2025-05-05 13:13:03 -07:00 |
|
Awni Hannun
|
825124af8f
|
fix bw for elementwise ops (#2151)
* fix bw for elementwise ops
* add compile
* fix
* fix
* fix
* fix
|
2025-05-05 06:15:04 -07:00 |
|
Angelos Katharopoulos
|
481349495b
|
GPU Hadamard for large N (#1879)
|
2025-05-01 17:19:17 -07:00 |
|
Awni Hannun
|
e496c5a4b4
|
fix integer overflow in qmm (#2143)
|
2025-04-30 09:28:56 -07:00 |
|
Awni Hannun
|
f1606486d2
|
Generalize gpu backend (#2138)
* generalize gpu backend
* fix no_gpu build
* fix no_gpu build
* generalize gpu backend
|
2025-04-30 09:08:17 -07:00 |
|
Alex Chi Z.
|
b36dd472bb
|
return library if it is successfully loaded (#2131)
|
2025-04-29 07:30:36 -07:00 |
|
hdeng-apple
|
167b759a38
|
Fix typos (#2136)
|
2025-04-29 07:26:05 -07:00 |
|
Angelos Katharopoulos
|
f0e70afff0
|
Fix swift pm load (#2117)
|
2025-04-24 10:58:29 -07:00 |
|
hdeng-apple
|
38c1e720c2
|
Search mlx.metallib in macOS framework "Resources" dir (#2061)
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
|
2025-04-23 09:53:13 -07:00 |
|
Yury Popov
|
1d2c9d6a07
|
Complex scan (#2094)
|
2025-04-22 18:56:28 -07:00 |
|
Awni Hannun
|
fdadc4f22c
|
Add more complex unary ops (#2101)
|
2025-04-21 13:04:54 -07:00 |
|
Angelos Katharopoulos
|
3cde719eb7
|
Route to gather qmm only for many tokens per expert (#2082)
|
2025-04-17 14:53:08 -07:00 |
|
Angelos Katharopoulos
|
5de6d94a90
|
Gather qmm batched kernel and refactoring of quantized (#2078)
|
2025-04-17 13:53:11 -07:00 |
|
Angelos Katharopoulos
|
99eefd2ec0
|
Gather mm new kernel and small refactoring (#2040)
|
2025-04-14 16:37:36 -07:00 |
|
Yury Popov
|
e9e268336b
|
LogCumSumExp (#2069)
|
2025-04-13 01:27:29 -07:00 |
|
Angelos Katharopoulos
|
c4189a38e4
|
Add float mask to sdpa vector (#2068)
|
2025-04-11 17:29:40 -07:00 |
|
Awni Hannun
|
ef7ece9851
|
fix fft bug (#2062)
|
2025-04-10 19:41:27 -07:00 |
|
Angelos Katharopoulos
|
9ecefd56db
|
Do not load the default lib if another is requested (#2055)
|
2025-04-09 13:31:38 -07:00 |
|
Awni Hannun
|
00794c42bc
|
Fix causal mask sdpa vec (#2053)
* fix sdpa vector causal mask
* test
|
2025-04-08 09:11:23 -07:00 |
|
Cheng
|
08a1bf3f10
|
Remove Event::Signal() (#2052)
|
2025-04-08 06:20:27 -07:00 |
|
Awni Hannun
|
60c4154346
|
Only request residency once (#2051)
|
2025-04-07 10:47:51 -07:00 |
|
Awni Hannun
|
1a28b69ee2
|
only add to residency set once (#2049)
|
2025-04-06 17:38:25 -07:00 |
|
Jagrit Digani
|
8777fd104f
|
Depthwise Conv2D optimization (#2036)
- Add new specialized kernel for small kernel (kernels size <= 7), small strides (strides <= 2) depthwise 2d convolutions
- Add related tests
|
2025-04-03 09:42:04 -07:00 |
|
Awni Hannun
|
c41f7565ed
|
fix softmax / logsumexp (#2042)
|
2025-04-03 08:32:59 -07:00 |
|
Awni Hannun
|
9ba81e3da4
|
tune quant dispatch (#2031)
|
2025-04-02 20:05:54 -07:00 |
|
Awni Hannun
|
f98ce25ab9
|
fix residency set for real (#2032)
|
2025-04-01 12:59:48 -07:00 |
|
Awni Hannun
|
de5f38fd48
|
Custom logsumexp (#2028)
* initial custom logsumexp
* more tests
* comments + fix
|
2025-03-31 07:36:55 -07:00 |
|
Angelos Katharopoulos
|
ec2854b13a
|
Swap -inf for finite_minimum value (#2029)
|
2025-03-30 21:55:04 -07:00 |
|
Awni Hannun
|
28f39e9038
|
Log for complex numbers in Metal (#2025)
* Log for complex numbers in Metal
* fix log2
|
2025-03-30 17:04:38 -07:00 |
|