| 
							
							
								 Jagrit Digani | 8777fd104f | Depthwise Conv2D optimization (#2036) - Add new specialized kernel for small kernel (kernels size <= 7), small strides (strides <= 2) depthwise 2d convolutions
- Add related tests | 2025-04-03 09:42:04 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | de5f38fd48 | Custom logsumexp (#2028) * initial custom logsumexp
* more tests
* comments + fix | 2025-03-31 07:36:55 -07:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | ec2854b13a | Swap -inf for finite_minimum value (#2029) | 2025-03-30 21:55:04 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 28f39e9038 | Log for complex numbers in Metal (#2025) * Log for complex numbers in Metal
* fix log2 | 2025-03-30 17:04:38 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 05d7118561 | causal vector sdpa (#2018) * causal vector sdpa
* get rid of memory threshold | 2025-03-28 12:36:13 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 98b901ad66 | enable complex gemm (#2017) | 2025-03-28 10:45:13 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 5580b47291 | iinfo and scalar overflow detection (#2009) | 2025-03-27 19:54:56 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | a84cc0123f | promote mask when needed (#1998) | 2025-03-23 19:58:28 -07:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 4eef8102c9 | Distributed layers (#1270) | 2025-03-21 13:52:17 -07:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 69e4dd506b | Add a ring all gather (#1985) | 2025-03-21 13:36:51 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 2a980a76ce | Add stats and limit to common allocator and enable tests (#1988) * add stats to common allocator and enable tests
* linux memory and default
* fix | 2025-03-21 12:28:36 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 4e1994e9d7 | move memory APIs into top level mlx.core (#1982) | 2025-03-21 07:25:12 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 7b7e2352cd | fix malloc or wait deadlock (#1976) | 2025-03-20 16:48:43 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 005e7efa64 | fix mask in sdpa (#1980) * fix mask in sdpa
* fix attention mask
* Re-enable routing for array mask
---------
Co-authored-by: Jagrit Digani <digani@apple.com> | 2025-03-20 14:53:12 -07:00 |  | 
			
				
					| 
							
							
								 Jagrit Digani | b42d13ec84 | Update attention tests to show diff, disable array masks (#1978) | 2025-03-20 14:25:38 -07:00 |  | 
			
				
					| 
							
							
								 Jagrit Digani | 9adcd1a650 | Support fused masking in Attention (#1924) * Update API to allow mask='causal' in fast::sdpa
* Add fallback
* Update steel::AttnParams
* Fix typo
* WIP, basic causal
* Update tests
* Update benchmarking
* Update masking loop limits
* Add bool masking and update tests
* Update additive mask
* Update benchmarks
* Update benchmarks
* Update tests
* Update for bfloat error
* Update early exit
* Add random seed to tests | 2025-03-20 11:01:32 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 3c164fca8c | Fix multistream GPU deadlock (#1969) * fix multistream GPU deadlock
* comments | 2025-03-20 07:19:47 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | c6ea2ba329 | Use same accumulation precision in gemv as gemm (#1962) * use same accumulation precision in gemv as gemm
* faster
* fix compile | 2025-03-16 07:13:24 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 2770a10240 | fix grad with inplace updates (#1961) | 2025-03-13 19:13:09 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 32da94507a | fix vmap for flatten (#1955) | 2025-03-11 10:42:22 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 3c3e558c60 | Support transposed head/seq for kv (#1950) * support transposed head/seq for kv
* fix flaky test
* nit | 2025-03-10 10:53:45 -07:00 |  | 
			
				
					| 
							
							
								 Abe Leininger | 3835a428c5 | Adds nuclear norm support (#1894) * adjust norm unit test tolerance | 2025-03-04 13:26:02 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 9680f72cca | Add a multi optimizer (#1916) | 2025-03-04 13:16:35 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | e613d0eaf0 | SDPA support for small batch (over sequence) queries (#1922) * batch query sdpa
* batch sdpa for query | 2025-03-04 10:59:04 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 6bcd6bcf70 | fix donation in scan (#1917) | 2025-03-03 11:30:59 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 4e7cd31d12 | Fix slice data size (#1913) * fix slice data size
* add test | 2025-03-02 21:50:42 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 5e6c130d93 | RMS norm without scaling (#1915) | 2025-02-28 20:26:57 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 7d042f17fe | Double for lapack (#1904) * double for lapack ops
* add double support for lapack ops | 2025-02-25 11:39:36 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 28b8079e30 | fix double type promotion (#1901) | 2025-02-25 06:00:53 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 7face5d9fd | fix cpu compile (#1897) | 2025-02-24 14:10:30 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 2d0f384b6f | fix simd erf_inv (#1896) | 2025-02-24 13:57:47 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 10b271d963 | Ring update (#1885) | 2025-02-20 14:32:31 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | bbda0fdbdb | Allow non-square lu (#1889) | 2025-02-20 08:13:23 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | c707b2b0a6 | Limit compile buffers (#1887) * limit compile buffers
* maybe not flaky test | 2025-02-19 20:28:13 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 78ba24c37d | Raise an exception in the rope op if input is integer (#1884) | 2025-02-19 14:43:39 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 1a2cb72030 | Ensure linspace always contains start and stop (#1883) | 2025-02-19 13:53:20 -08:00 |  | 
			
				
					| 
							
							
								 Abe Leininger | 344a29506e | Enforce triangular matrix form in tri_inv(#1876)* fix tri_inv bug
* Revert "fix tri_inv bug"
This reverts commit b74b290201.
* Make sure that tri_inv returns a triangular matrix
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com> | 2025-02-19 12:42:33 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 71de73a668 | Fix convs by reverting #1803 (#1882) | 2025-02-18 14:36:34 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | 4c1dfa58b7 | xor op on arrays (#1875) | 2025-02-17 00:24:53 -08:00 |  | 
			
				
					| 
							
							
								 Jagrit Digani | 2dc307f2e6 | Winograd Update for Small batches  (#1803) * Build in padding to Winograd kernels
* Add new fused Winograd kernel
* Enable weight flipping in Winograd kernels | 2025-02-14 13:08:13 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | 7f2d1024f3 | add f8_e4m3 loading (#1859) | 2025-02-13 17:10:03 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 428f589364 | Revert "More buffer donation in some cases (#1858)" (#1863) This reverts commit d274ae77f2. | 2025-02-13 14:21:44 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | 5cd97f7ffe | Bitwise Inverse (#1862) * add bitwise inverse
* add vmap + fix nojit
* inverse -> invert
* add to compile + remove unused | 2025-02-13 08:44:14 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | d274ae77f2 | More buffer donation in some cases (#1858) * more donation
* fix
* add test | 2025-02-12 19:41:37 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | 55c5ac7820 | fix int64 bug (#1860) | 2025-02-12 19:23:46 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 0145911bea | Fixes output donation for IO ops on the GPU (#1857) | 2025-02-12 10:52:30 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 0a5215693e | Fix grad copies (#1854) * fix grad with copies
* add test
* add test | 2025-02-11 15:26:42 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 2a45056ba8 | Cycle leak break (#1856) * detect and break leaks in custom function
* detect and break leaks in custom function | 2025-02-11 14:45:02 -08:00 |  | 
			
				
					| 
							
							
								 Abe Leininger | a5ededf1c3 | CPU LU factorization and linear solvers (#1451) * linalg solve backend
* nits
* more nits + fix
* luf primitive and lu, solve, and solve_triangular backends
* changes / nits
---------
Co-authored-by: Awni Hannun <awni@apple.com> | 2025-02-10 12:32:24 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 9eb7d7362f | Fix Split::vmap (#1845) | 2025-02-08 09:22:13 -08:00 |  |