| 
							
							
								 Jesper Stemann Andersen | af2af818a6 | Enables build for *-linux-musl (#1627) Also contributes to being able to build for *-w64-mingw32.
Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761 | 2024-11-27 13:14:24 -08:00 |  | 
			
				
					| 
							
							
								 Jesper Stemann Andersen | 698e63a608 | CMake: Build with dlfcn-win32 to have dlopen etc. on win32 (#1628) Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761 | 2024-11-27 13:14:13 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 211411faf2 | fix large ops (#1620) | 2024-11-24 09:17:10 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | bb303c45a5 | version (#1617)
						
						
						
						
						
						
							
 v0.21.0 | 2024-11-22 12:00:03 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | 6f7986d592 | Cleaner qmv/qvm(#1616) | 2024-11-22 11:14:08 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 7cbb4aef17 | Doc fix (#1615) | 2024-11-22 11:12:25 -08:00 |  | 
			
				
					| 
							
							
								 Jagrit Digani | 02bec0bb6d | Matrix Attention kernel  (#1610) * Rough INIT
* [WIP]: Loading and Matmuls added
* [WIP]: Reductions and min working aligned kernel at headdim = 64
* [WIP] Added headdim 80 for testing
* [WIP] Update dispatch params for testing
* [WIP] Add support for unaligned seq lengths - still looks messy
* Update sdpa_benchmarks
* Update sdpa_benchmarks
* Update sdpa_benchmarks
* Enable gqa support
* Update benchmark and switch off 128 headdim
* Update headdim 128 tuning
* Remove older fast attention code. Write out O strided
* Disable hd=128 until further optimizations
* Enable bf16
* Fix data size bug
* Enable attn build outside of jit | 2024-11-22 10:34:05 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | c79f6a4a8c | 3 and 6 bit quantization (#1613) * Support 3 and 6 bit quantization | 2024-11-22 10:22:13 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 0c5eea226b | Reduce specializations (#1607) * start of reduce specializations
* fix all reduce
* fix many dims
* fix
* non-jit tests clear
* cleanup instantiations
* cpu merges
* change dim specializations
* optimize
* fix jit
* fix jit
* use higher precision for integer sum+prod
* fixes | 2024-11-21 19:53:00 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | dcca0d7477 | contiguous op / prim (#1612) | 2024-11-21 19:51:49 -08:00 |  | 
			
				
					| 
							
							
								 Cocoa | 0d5e7716ad | fix typo: accross -> across (#1609) Signed-off-by: Cocoa <i@uwucocoa.moe> | 2024-11-20 15:30:51 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | d8c824c594 | Formatting fixes (#1606) | 2024-11-20 15:30:36 -08:00 |  | 
			
				
					| 
							
							
								 Saanidhya | cb431dfc9f | Adds 3D pooling (#1526) | 2024-11-19 16:45:24 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 61d787726a | Fix view scalar bug segfault (#1603) * fix view scalar bug
* fix view scalar bug
* one more fix | 2024-11-19 10:54:05 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 5e89aace9b | Fix concatenate vmap (#1600) | 2024-11-19 10:44:04 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 2af7e8a9a6 | fix cmake version (#1601) | 2024-11-19 08:45:05 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 2419edd5b2 | Faster indexing math in a few kernels (#1589) * wip: faster compiled kernels
* faster general unary with uint specialization
* index type in compiled, unary, binary, ternary, copy
* fix jit
* jit fix
* specialize gather + scatter
* nit in docs | 2024-11-18 19:52:00 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | bf481e8e5d | Fix sibling leak (#1590) * add test
* fix + test
* fix fix | 2024-11-18 19:17:01 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 9d7fa6b8e6 | Use osx deployment target to pick Metal version (#1595) * choose metal based on deployment target rather than system version
* nit
* unused compile def | 2024-11-18 19:16:49 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 073076ac7d | 2-Pass Sdpa Inference Kernel (#1597) | 2024-11-18 17:31:53 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 9bd03dd9b4 | More buffer donation with no-ops (#1591) * more donation
* fix test
* fix build | 2024-11-18 08:35:41 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 6931f84412 | fix dispatch threads for a few kernels (#1594) | 2024-11-18 08:35:25 -08:00 |  | 
			
				
					| 
							
							
								 xnorai | 16ec0556a0 | Allocate raw JSON metadata buffer on the heap, and limit its size (#1596) * Allocate raw JSON metadata buffer on the heap, and limit its size to 1GiB
* Set the upper size limit for the header to 100K as in Rust safetensors | 2024-11-18 07:22:51 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 610af352d4 | Dispatch bf16 at run time when using the JIT (#1584) * Dispatch bf16 at run time when using the JIT
* fix extension
* fix extension build
* fix extension build
* Update utils.h | 2024-11-15 16:54:36 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | b35f1e3c9c | fix donation in sdpa (#1587) | 2024-11-13 17:21:13 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | dfa0b9aab4 | Cpu fast quantize (#1578) * cpu quantize
* fix | 2024-11-08 20:10:39 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | a4c47b0276 | OOB QMV fix (#1579) * fix oob access in qmv
* skip more
* fix small case | 2024-11-08 17:59:45 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | 111fefd5e9 | Fix OOB access in qmv (#1577) * fix oob access in qmv
* skip more | 2024-11-08 15:41:30 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | c1fe1ef081 | Bfs width limit (#1568) * width limit
* fix
* large limit
* put env vars in env namespace | 2024-11-08 15:00:46 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 8c34c9dac4 | throw for invalid case and remove test (#1575) | 2024-11-08 12:04:03 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 91c0277356 | fix per-example mask + docs in sdpa (#1574) | 2024-11-08 11:51:15 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 9f0d5c12fc | Fully wrap the command encoder (#1572) * fully wrap the command encoder
* use consistent style + fix extensions | 2024-11-08 11:50:21 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 59247c2b62 | add groups in conv2d (#1569) | 2024-11-07 13:57:53 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 9a3842a2d9 | fix (#1566) | 2024-11-06 17:10:33 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | 726dbd9267 | v0.20.0 (#1565)
						
						
						
						
						
						
							
 v0.20.0 | 2024-11-05 12:37:57 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 54f05e7195 | Fix gather vmap (#1563) * fix gather
* fix | 2024-11-05 11:29:20 -08:00 |  | 
			
				
					| 
							
							
								 Alex Barron | 26be608470 | Add split_k qvmfor long context (#1564)* Add splitk qvm
* configurable splitk
* tuning
* remove extra instantiation
* remove refactor
* separate test
* cpu tolerance | 2024-11-05 11:25:19 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 248431eb3c | Reductions update (#1351) | 2024-11-04 22:25:16 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 76f275b4df | error in rms for wrong size (#1562) | 2024-11-04 13:24:02 -08:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | f1951d6cce | Use fewer barriers (#1561) * use fewer barriers
* comment | 2024-11-04 10:26:49 -08:00 |  | 
			
				
					| 
							
							
								 Angelos Katharopoulos | 62f297b51d | Sdpa fix (#1558) | 2024-11-02 21:25:46 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 09bc32f62f | No extra reshape (#1557) * no extra reshape
* lint | 2024-11-02 19:07:20 -07:00 |  | 
			
				
					| 
							
							
								 Chris Offner | 46d8b16ab4 | Fix vmap example in docs (#1556) | 2024-11-02 17:44:14 -07:00 |  | 
			
				
					| 
							
							
								 Chris Offner | 42533931fa | Fix typo "it's" -> "its" (#1555) | 2024-11-02 06:06:34 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 9bd3a7102f | add python 3.13 to circle (#1553) | 2024-11-01 20:55:35 -07:00 |  | 
			
				
					| 
							
							
								 Alex Barron | 9e516b71ea | Add dispatchThreads to custom kernel doc (#1551) * add dispatchThreads info
* update
* add link | 2024-11-01 13:07:48 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | eac961ddb1 | patch (#1550)
						
						
						
						
						
						
							
 v0.19.3 | 2024-10-31 16:10:14 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 57c6aa7188 | fix multi output leak (#1548) | 2024-10-31 09:32:01 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | cde5b4ad80 | patch (#1546)
						
						
						
						
						
						
							
 v0.19.2 | 2024-10-30 19:31:22 -07:00 |  | 
			
				
					| 
							
							
								 Awni Hannun | 4f72c66911 | improvements to scatter / gather (#1541) | 2024-10-30 19:30:54 -07:00 |  |