| 
							
							
								 Cheng | 52dc8c8cd5 | Add profiler annotations in common primitives for CUDA backend (#2244) | 2025-06-04 19:55:12 -07:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | aede70e81d | Perf regression fix (#2243)
						
						
						
						
						
						
							
 v0.26.1 | 2025-06-03 17:55:12 -07:00 |  | 
			
				
					| 
							
							
								 Cheng | 85a8beb5e4 | Avoid atomic updates across CPU/GPU in CUDA event (#2231) | 2025-06-03 16:49:06 -07:00 |  | 
			
				
					| 
							
							
								 Cheng | 0bb89e9e5f | Share more common code in Compiled (#2240) * Share more common code in Compiled
* Remove build_lib_name | 2025-06-03 16:48:50 -07:00 |  | 
			
				
					| 
							
							
								 Cheng | 5685ceb3c7 | Avoid invoking allocator::malloc when creating CUDA event (#2232) | 2025-06-03 16:48:40 -07:00 |  | 
			
				
					| 
							
							
								 Suryash Malviya | 0408ba0a76 | Optimizing Complex Matrix Multiplication using Karatsuba’s Algorithm  (#2220) * Implementing Complex Matmul using Karatsuba Algorithm
* Implemented Karatsuba's Algorithm for complex matmul and pre-commit them
* fix
---------
Co-authored-by: Awni Hannun <awni@apple.com>v0.26.0 | 2025-06-02 15:58:46 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | cbad6c3093 | version (#2237) | 2025-06-02 15:58:33 -07:00 |  | 
			
				
					| 
							
							
								 Cheng | 1b021f6984 | Fast primitives decide when to use the fallback (#2216) | 2025-06-02 13:26:37 -07:00 |  | 
			
				
					| 
							
							
								 Cheng | 95b7551d65 | Do not check event.is_signaled() in eval_impl (#2230) | 2025-06-02 13:23:34 -07:00 |  | 
			
				
					| 
							
							
								 Cheng | db5a7c6192 | Add memory cache to CUDA backend (#2221) * Move BufferCache out of allocator
* Add memory cache to cuda backend allocator
* Simplify BufferCache assuming buf can not be null | 2025-05-30 12:12:54 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 6ef2f67e7f | 5bit quants (#2226) * 5bit quants
* 5bit quants | 2025-05-30 12:12:10 -07:00 |  | 
			
				
					| 
							
							
								 Cheng | f76ee1ffd2 | Move some dims utils to common (#2223) | 2025-05-29 06:48:30 -07:00 |  | 
			
				
					| 
							
							
								 Cheng | 54a71f270a | Remove unused defines (#2217) | 2025-05-23 06:14:58 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 55b4062dd8 | copyright in docs (#2214) | 2025-05-21 17:13:04 -07:00 |  | 
			
				
					| 
							
							
								 Cheng | 79071bfba4 | Fix out-of-bounds default value in logsumexp/softmax (#2213) | 2025-05-21 07:25:16 -07:00 |  | 
			
				
					| 
							
							
								 Cheng | 7774b87cbd | Remove redundant simd_sum in logsumexp (#2210) | 2025-05-21 07:25:03 -07:00 |  | 
			
				
					| 
							
							
								 Cheng | 35c87741cf | Build for compute capability 70 instead of 75 (#2209) | 2025-05-20 19:42:48 -07:00 |  | 
			
				
					| 
							
							
								 Jack Wind | 4cbe605214 | Feat: Allow per-target Metal debug flags (#2201) * feat: allow per-target Metal debug flags
* formatting fix | 2025-05-20 10:22:26 -07:00 |  | 
			
				
					| 
							
							
								 Clement Liaw | ab8883dd55 | include mlx::core::version() symbols in the mlx static library (#2207) | 2025-05-20 07:39:11 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | eebe73001a | fix large arg reduce (#2206) | 2025-05-19 13:10:44 -07:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 0359bf02c9 | Nearest upsample (#2202) | 2025-05-19 11:23:38 -07:00 |  | 
			
				
					| 
							
							
								 Cheng | 237f9e58a8 | Fix BEFORE keyword in target_include_directories (#2204) | 2025-05-19 06:10:44 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 8576e6fe36 | fix conv2d bug + faster conv 1d (#2195) * fix conv2d bug + faster conv 1d
* revert sort + flaky test | 2025-05-18 06:05:11 -07:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 0654543dcc | Add complex eigh (#2191) | 2025-05-18 00:18:43 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 48ef3e74e2 | reduce vjp for all and any (#2193) | 2025-05-16 08:38:49 -07:00 |  | 
			
				
					| 
							
							
								 Cheng | 7d4b378952 | Include cuda_bf16.h for bfloat16 overloads (#2192) * Include cuda_bf16.h for bfloat16 overloads
* Add NO_GPU_MULTI(Eig) in cuda backend | 2025-05-16 06:44:42 -07:00 |  | 
			
				
					| 
							
							
								 Jack Wind | 7ff5c41e06 | Add set_threadgroup_memory_length to CommandEncoder (#2183) | 2025-05-16 00:28:03 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 602f43e3d1 | fix conv grad (#2187) | 2025-05-15 19:20:36 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | a2cadb8218 | real and imag properties (#2189) | 2025-05-15 18:17:50 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | c1eb9d05d9 | non-symmetric eig and eigh (#2188) | 2025-05-15 13:01:44 -07:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | cf6c939e86 | Fix some complex vjps (#2178) | 2025-05-14 23:37:12 -07:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 130df35e1b | Add random normal distribution for complex numbers (#2182) | 2025-05-13 22:43:45 -07:00 |  | 
			
				
					| 
							
							
								 Cheng | 0751263dec | Fix typo in row_reduce_small (#2179) | 2025-05-13 20:19:54 -07:00 |  | 
			
				
					| 
							
							
								 Cheng | eca2f3eb97 | Add remove_index utility (#2173) | 2025-05-13 17:09:56 -07:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 3aa9cf3f9e | Fix put_along_axis for empty arrays (#2181) | 2025-05-13 14:27:53 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 8f3d208dce | Close a couple edge case bugs: hadamard and addmm on empty inputs (#2177) * handle hadamard and addmm on empty inputs
* fix | 2025-05-12 10:48:57 -07:00 |  | 
			
				
					| 
							
							
								 Ivan Fioravanti | caaa3f1f8c | Small typos in mx.metal deprecations (#2176) | 2025-05-11 06:03:47 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 659a51919f | patch bump (#2162)
						
						
						
						
						
						
							
 v0.25.2 | 2025-05-09 14:35:14 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 6661387066 | Fix fft for integer overflow (#2161) | 2025-05-09 14:25:12 -07:00 |  | 
			
				
					| 
							
							
								 ATurker | a7fae8a176 | fix: conv_general differences between gpu, cpu (#2070) * fix general_conv padding
* fix bugs
* add test
---------
Co-authored-by: Awni Hannun <awni@apple.com> | 2025-05-09 10:26:52 -07:00 |  | 
			
				
					| 
							
							
								 Cheng | 0cae0bdac8 | CUDA backend: backbone (#2075) | 2025-05-06 21:26:46 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 5a1a5d5ed1 | fix input coherent kernel launch (#2153) | 2025-05-05 17:30:50 -07:00 |  | 
			
				
					| 
							
							
								 Cheng | 1683975acf | Move common gpu primitives to backend/gpu (#2145) | 2025-05-05 13:45:29 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | af705590ac | fix batched vector sdpa (#2152) | 2025-05-05 13:13:03 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 825124af8f | fix bw for elementwise ops (#2151) * fix bw for elementwise ops
* add compile
* fix
* fix
* fix
* fix | 2025-05-05 06:15:04 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 9c5e7da507 | fix compile merging (#2150) | 2025-05-02 15:08:50 -07:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 481349495b | GPU Hadamard for large N (#1879) | 2025-05-01 17:19:17 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 9daa6b003f | fix shapeless export (#2148) | 2025-05-01 15:02:02 -07:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | a3a632d567 | Fix the launcher when ran locally (#2147) | 2025-05-01 12:56:09 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | e496c5a4b4 | fix integer overflow in qmm (#2143) | 2025-04-30 09:28:56 -07:00 |  |