| 
							
							
								 Angelos Katharopoulos | 99eefd2ec0 | Gather mm new kernel and small refactoring (#2040) | 2025-04-14 16:37:36 -07:00 |  | 
			
				
					| 
							
							
								 Yury Popov | e9e268336b | LogCumSumExp (#2069) | 2025-04-13 01:27:29 -07:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | c4189a38e4 | Add float mask to sdpa vector (#2068) | 2025-04-11 17:29:40 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | ef7ece9851 | fix fft bug (#2062) | 2025-04-10 19:41:27 -07:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | ddaa4b7dcb | Fix the test and add custom min/max reductions for uncommon MPI types (#2060) | 2025-04-10 17:01:17 -07:00 |  | 
			
				
					| 
							
							
								 Anastasiia Filippova | 515f104926 | Min / max reductions (#2041) | 2025-04-09 23:22:20 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 00794c42bc | Fix causal mask sdpa vec (#2053) * fix sdpa vector causal mask
* test | 2025-04-08 09:11:23 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | f2c85308c1 | add a half simd gemm fallback (#2046) * add a half simd gemm fallback
* nit | 2025-04-07 09:31:29 -07:00 |  | 
			
				
					| 
							
							
								 Jagrit Digani | 8777fd104f | Depthwise Conv2D optimization (#2036) - Add new specialized kernel for small kernel (kernels size <= 7), small strides (strides <= 2) depthwise 2d convolutions
- Add related tests | 2025-04-03 09:42:04 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | de5f38fd48 | Custom logsumexp (#2028) * initial custom logsumexp
* more tests
* comments + fix | 2025-03-31 07:36:55 -07:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | ec2854b13a | Swap -inf for finite_minimum value (#2029) | 2025-03-30 21:55:04 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 28f39e9038 | Log for complex numbers in Metal (#2025) * Log for complex numbers in Metal
* fix log2 | 2025-03-30 17:04:38 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 05d7118561 | causal vector sdpa (#2018) * causal vector sdpa
* get rid of memory threshold | 2025-03-28 12:36:13 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 98b901ad66 | enable complex gemm (#2017) | 2025-03-28 10:45:13 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 5580b47291 | iinfo and scalar overflow detection (#2009) | 2025-03-27 19:54:56 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | a84cc0123f | promote mask when needed (#1998) | 2025-03-23 19:58:28 -07:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 4eef8102c9 | Distributed layers (#1270) | 2025-03-21 13:52:17 -07:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 69e4dd506b | Add a ring all gather (#1985) | 2025-03-21 13:36:51 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 2a980a76ce | Add stats and limit to common allocator and enable tests (#1988) * add stats to common allocator and enable tests
* linux memory and default
* fix | 2025-03-21 12:28:36 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 4e1994e9d7 | move memory APIs into top level mlx.core (#1982) | 2025-03-21 07:25:12 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 7b7e2352cd | fix malloc or wait deadlock (#1976) | 2025-03-20 16:48:43 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 005e7efa64 | fix mask in sdpa (#1980) * fix mask in sdpa
* fix attention mask
* Re-enable routing for array mask
---------
Co-authored-by: Jagrit Digani <digani@apple.com> | 2025-03-20 14:53:12 -07:00 |  | 
			
				
					| 
							
							
								 Jagrit Digani | b42d13ec84 | Update attention tests to show diff, disable array masks (#1978) | 2025-03-20 14:25:38 -07:00 |  | 
			
				
					| 
							
							
								 Jagrit Digani | 9adcd1a650 | Support fused masking in Attention (#1924) * Update API to allow mask='causal' in fast::sdpa
* Add fallback
* Update steel::AttnParams
* Fix typo
* WIP, basic causal
* Update tests
* Update benchmarking
* Update masking loop limits
* Add bool masking and update tests
* Update additive mask
* Update benchmarks
* Update benchmarks
* Update tests
* Update for bfloat error
* Update early exit
* Add random seed to tests | 2025-03-20 11:01:32 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 3c164fca8c | Fix multistream GPU deadlock (#1969) * fix multistream GPU deadlock
* comments | 2025-03-20 07:19:47 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | c6ea2ba329 | Use same accumulation precision in gemv as gemm (#1962) * use same accumulation precision in gemv as gemm
* faster
* fix compile | 2025-03-16 07:13:24 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 2770a10240 | fix grad with inplace updates (#1961) | 2025-03-13 19:13:09 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 32da94507a | fix vmap for flatten (#1955) | 2025-03-11 10:42:22 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 3c3e558c60 | Support transposed head/seq for kv (#1950) * support transposed head/seq for kv
* fix flaky test
* nit | 2025-03-10 10:53:45 -07:00 |  | 
			
				
					| 
							
							
								 Abe Leininger | 3835a428c5 | Adds nuclear norm support (#1894) * adjust norm unit test tolerance | 2025-03-04 13:26:02 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 9680f72cca | Add a multi optimizer (#1916) | 2025-03-04 13:16:35 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | e613d0eaf0 | SDPA support for small batch (over sequence) queries (#1922) * batch query sdpa
* batch sdpa for query | 2025-03-04 10:59:04 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 6bcd6bcf70 | fix donation in scan (#1917) | 2025-03-03 11:30:59 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 4e7cd31d12 | Fix slice data size (#1913) * fix slice data size
* add test | 2025-03-02 21:50:42 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 5e6c130d93 | RMS norm without scaling (#1915) | 2025-02-28 20:26:57 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 7d042f17fe | Double for lapack (#1904) * double for lapack ops
* add double support for lapack ops | 2025-02-25 11:39:36 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 28b8079e30 | fix double type promotion (#1901) | 2025-02-25 06:00:53 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 7face5d9fd | fix cpu compile (#1897) | 2025-02-24 14:10:30 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 2d0f384b6f | fix simd erf_inv (#1896) | 2025-02-24 13:57:47 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 10b271d963 | Ring update (#1885) | 2025-02-20 14:32:31 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | bbda0fdbdb | Allow non-square lu (#1889) | 2025-02-20 08:13:23 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | c707b2b0a6 | Limit compile buffers (#1887) * limit compile buffers
* maybe not flaky test | 2025-02-19 20:28:13 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 78ba24c37d | Raise an exception in the rope op if input is integer (#1884) | 2025-02-19 14:43:39 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 1a2cb72030 | Ensure linspace always contains start and stop (#1883) | 2025-02-19 13:53:20 -08:00 |  | 
			
				
					| 
							
							
								 Abe Leininger | 344a29506e | Enforce triangular matrix form in tri_inv(#1876)* fix tri_inv bug
* Revert "fix tri_inv bug"
This reverts commit b74b290201.
* Make sure that tri_inv returns a triangular matrix
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com> | 2025-02-19 12:42:33 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 71de73a668 | Fix convs by reverting #1803 (#1882) | 2025-02-18 14:36:34 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | 4c1dfa58b7 | xor op on arrays (#1875) | 2025-02-17 00:24:53 -08:00 |  | 
			
				
					| 
							
							
								 Jagrit Digani | 2dc307f2e6 | Winograd Update for Small batches  (#1803) * Build in padding to Winograd kernels
* Add new fused Winograd kernel
* Enable weight flipping in Winograd kernels | 2025-02-14 13:08:13 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | 7f2d1024f3 | add f8_e4m3 loading (#1859) | 2025-02-13 17:10:03 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 428f589364 | Revert "More buffer donation in some cases (#1858)" (#1863) This reverts commit d274ae77f2. | 2025-02-13 14:21:44 -08:00 |  |