| 
							
							
								 Nripesh Niketan | 3bb5b4a302 | Chore: Add default language in pre-commit and bump hooks (#1652) | 2024-12-06 07:54:29 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | fc88fd9097 | Shape and Strides 1 / N (#1645) * shape and stride type def
* more shape | 2024-12-05 12:53:43 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | c5b0928c1f | fix fallback (#1646) | 2024-12-05 11:59:53 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | e047fd977d | compile changes if stream changes (#1644) | 2024-12-03 14:37:44 -08:00 |  | 
			
				
					| 
							
							
								 Jagrit Digani | 9d40e521d7 | Stop matrix copies with new attention kernel (#1639) | 2024-12-02 14:12:38 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | 1445dcaa60 | let class predicate specify quantization parameters (#1638) | 2024-12-02 14:09:28 -08:00 |  | 
			
				
					| 
							
							
								 Jesper Stemann Andersen | e4eeb4e910 | Added missing unordered_map includes (#1635) * Added missing includes in mlx/io.h and mlx/backend/metal/metal.h
* Added additional missing unordered_map includes that fixes build on FreeBSD | 2024-12-02 07:03:03 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | aa86876813 | fix transformer decoder post norm LN (#1637) | 2024-12-02 07:02:17 -08:00 |  | 
			
				
					| 
							
							
								 Jesper Stemann Andersen | 974bb54ab2 | CMake: Enabled using Accelerate on x86_64 / x64 (#1625) * CMake: Enabled using Accelerate on x86_64 / x64
Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761
* CMake: Removed superfluous MLX_BUILD_ARM | 2024-11-28 10:55:45 -08:00 |  | 
			
				
					| 
							
							
								 Ikko Eltociear Ashimine | 9bc2183a31 | docs: update device.cpp (#1632) unecessary -> unnecessary | 2024-11-27 20:58:26 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | d4b222b6d3 | Fix some leaks and races (#1629) * fix leak and fix potential race
* more leak fixes
* fix one more | 2024-11-27 20:01:20 -08:00 |  | 
			
				
					| 
							
							
								 Jesper Stemann Andersen | af2af818a6 | Enables build for *-linux-musl (#1627) Also contributes to being able to build for *-w64-mingw32.
Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761 | 2024-11-27 13:14:24 -08:00 |  | 
			
				
					| 
							
							
								 Jesper Stemann Andersen | 698e63a608 | CMake: Build with dlfcn-win32 to have dlopen etc. on win32 (#1628) Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761 | 2024-11-27 13:14:13 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 211411faf2 | fix large ops (#1620) | 2024-11-24 09:17:10 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | bb303c45a5 | version (#1617)
						
						
						
						
						
						
							
 v0.21.0 | 2024-11-22 12:00:03 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | 6f7986d592 | Cleaner qmv/qvm(#1616) | 2024-11-22 11:14:08 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 7cbb4aef17 | Doc fix (#1615) | 2024-11-22 11:12:25 -08:00 |  | 
			
				
					| 
							
							
								 Jagrit Digani | 02bec0bb6d | Matrix Attention kernel  (#1610) * Rough INIT
* [WIP]: Loading and Matmuls added
* [WIP]: Reductions and min working aligned kernel at headdim = 64
* [WIP] Added headdim 80 for testing
* [WIP] Update dispatch params for testing
* [WIP] Add support for unaligned seq lengths - still looks messy
* Update sdpa_benchmarks
* Update sdpa_benchmarks
* Update sdpa_benchmarks
* Enable gqa support
* Update benchmark and switch off 128 headdim
* Update headdim 128 tuning
* Remove older fast attention code. Write out O strided
* Disable hd=128 until further optimizations
* Enable bf16
* Fix data size bug
* Enable attn build outside of jit | 2024-11-22 10:34:05 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | c79f6a4a8c | 3 and 6 bit quantization (#1613) * Support 3 and 6 bit quantization | 2024-11-22 10:22:13 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 0c5eea226b | Reduce specializations (#1607) * start of reduce specializations
* fix all reduce
* fix many dims
* fix
* non-jit tests clear
* cleanup instantiations
* cpu merges
* change dim specializations
* optimize
* fix jit
* fix jit
* use higher precision for integer sum+prod
* fixes | 2024-11-21 19:53:00 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | dcca0d7477 | contiguous op / prim (#1612) | 2024-11-21 19:51:49 -08:00 |  | 
			
				
					| 
							
							
								 Cocoa | 0d5e7716ad | fix typo: accross -> across (#1609) Signed-off-by: Cocoa <i@uwucocoa.moe> | 2024-11-20 15:30:51 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | d8c824c594 | Formatting fixes (#1606) | 2024-11-20 15:30:36 -08:00 |  | 
			
				
					| 
							
							
								 Saanidhya | cb431dfc9f | Adds 3D pooling (#1526) | 2024-11-19 16:45:24 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 61d787726a | Fix view scalar bug segfault (#1603) * fix view scalar bug
* fix view scalar bug
* one more fix | 2024-11-19 10:54:05 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 5e89aace9b | Fix concatenate vmap (#1600) | 2024-11-19 10:44:04 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 2af7e8a9a6 | fix cmake version (#1601) | 2024-11-19 08:45:05 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 2419edd5b2 | Faster indexing math in a few kernels (#1589) * wip: faster compiled kernels
* faster general unary with uint specialization
* index type in compiled, unary, binary, ternary, copy
* fix jit
* jit fix
* specialize gather + scatter
* nit in docs | 2024-11-18 19:52:00 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | bf481e8e5d | Fix sibling leak (#1590) * add test
* fix + test
* fix fix | 2024-11-18 19:17:01 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 9d7fa6b8e6 | Use osx deployment target to pick Metal version (#1595) * choose metal based on deployment target rather than system version
* nit
* unused compile def | 2024-11-18 19:16:49 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 073076ac7d | 2-Pass Sdpa Inference Kernel (#1597) | 2024-11-18 17:31:53 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 9bd03dd9b4 | More buffer donation with no-ops (#1591) * more donation
* fix test
* fix build | 2024-11-18 08:35:41 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 6931f84412 | fix dispatch threads for a few kernels (#1594) | 2024-11-18 08:35:25 -08:00 |  | 
			
				
					| 
							
							
								 xnorai | 16ec0556a0 | Allocate raw JSON metadata buffer on the heap, and limit its size (#1596) * Allocate raw JSON metadata buffer on the heap, and limit its size to 1GiB
* Set the upper size limit for the header to 100K as in Rust safetensors | 2024-11-18 07:22:51 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 610af352d4 | Dispatch bf16 at run time when using the JIT (#1584) * Dispatch bf16 at run time when using the JIT
* fix extension
* fix extension build
* fix extension build
* Update utils.h | 2024-11-15 16:54:36 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | b35f1e3c9c | fix donation in sdpa (#1587) | 2024-11-13 17:21:13 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | dfa0b9aab4 | Cpu fast quantize (#1578) * cpu quantize
* fix | 2024-11-08 20:10:39 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | a4c47b0276 | OOB QMV fix (#1579) * fix oob access in qmv
* skip more
* fix small case | 2024-11-08 17:59:45 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | 111fefd5e9 | Fix OOB access in qmv (#1577) * fix oob access in qmv
* skip more | 2024-11-08 15:41:30 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | c1fe1ef081 | Bfs width limit (#1568) * width limit
* fix
* large limit
* put env vars in env namespace | 2024-11-08 15:00:46 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 8c34c9dac4 | throw for invalid case and remove test (#1575) | 2024-11-08 12:04:03 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 91c0277356 | fix per-example mask + docs in sdpa (#1574) | 2024-11-08 11:51:15 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 9f0d5c12fc | Fully wrap the command encoder (#1572) * fully wrap the command encoder
* use consistent style + fix extensions | 2024-11-08 11:50:21 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 59247c2b62 | add groups in conv2d (#1569) | 2024-11-07 13:57:53 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 9a3842a2d9 | fix (#1566) | 2024-11-06 17:10:33 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | 726dbd9267 | v0.20.0 (#1565)
						
						
						
						
						
						
							
 v0.20.0 | 2024-11-05 12:37:57 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 54f05e7195 | Fix gather vmap (#1563) * fix gather
* fix | 2024-11-05 11:29:20 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | 26be608470 | Add split_k qvmfor long context (#1564)* Add splitk qvm
* configurable splitk
* tuning
* remove extra instantiation
* remove refactor
* separate test
* cpu tolerance | 2024-11-05 11:25:19 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 248431eb3c | Reductions update (#1351) | 2024-11-04 22:25:16 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 76f275b4df | error in rms for wrong size (#1562) | 2024-11-04 13:24:02 -08:00 |  |