Commit Graph

  • 0bb89e9e5f Share more common code in Compiled (#2240) Cheng 2025-06-04 08:48:50 +09:00
  • 5685ceb3c7 Avoid invoking allocator::malloc when creating CUDA event (#2232) Cheng 2025-06-04 08:48:40 +09:00
  • 0408ba0a76 Optimizing Complex Matrix Multiplication using Karatsuba’s Algorithm (#2220) v0.26.0 Suryash Malviya 2025-06-02 18:58:46 -04:00
  • cbad6c3093 version (#2237) Awni Hannun 2025-06-02 15:58:33 -07:00
  • 1b021f6984 Fast primitives decide when to use the fallback (#2216) Cheng 2025-06-03 05:26:37 +09:00
  • 95b7551d65 Do not check event.is_signaled() in eval_impl (#2230) Cheng 2025-06-03 05:23:34 +09:00
  • db5a7c6192 Add memory cache to CUDA backend (#2221) Cheng 2025-05-31 04:12:54 +09:00
  • 6ef2f67e7f 5bit quants (#2226) Awni Hannun 2025-05-30 12:12:10 -07:00
  • f76ee1ffd2 Move some dims utils to common (#2223) Cheng 2025-05-29 22:48:30 +09:00
  • 54a71f270a Remove unused defines (#2217) Cheng 2025-05-23 22:14:58 +09:00
  • 55b4062dd8 copyright in docs (#2214) Awni Hannun 2025-05-21 17:13:04 -07:00
  • 79071bfba4 Fix out-of-bounds default value in logsumexp/softmax (#2213) Cheng 2025-05-21 23:25:16 +09:00
  • 7774b87cbd Remove redundant simd_sum in logsumexp (#2210) Cheng 2025-05-21 23:25:03 +09:00
  • 35c87741cf Build for compute capability 70 instead of 75 (#2209) Cheng 2025-05-21 11:42:48 +09:00
  • 4cbe605214 Feat: Allow per-target Metal debug flags (#2201) Jack Wind 2025-05-20 13:22:26 -04:00
  • ab8883dd55 include mlx::core::version() symbols in the mlx static library (#2207) Clement Liaw 2025-05-20 07:39:11 -07:00
  • eebe73001a fix large arg reduce (#2206) Awni Hannun 2025-05-19 13:10:44 -07:00
  • 0359bf02c9 Nearest upsample (#2202) Angelos Katharopoulos 2025-05-19 11:23:38 -07:00
  • 237f9e58a8 Fix BEFORE keyword in target_include_directories (#2204) Cheng 2025-05-19 22:10:44 +09:00
  • 8576e6fe36 fix conv2d bug + faster conv 1d (#2195) Awni Hannun 2025-05-18 06:05:11 -07:00
  • 0654543dcc Add complex eigh (#2191) Angelos Katharopoulos 2025-05-18 00:18:43 -07:00
  • 48ef3e74e2 reduce vjp for all and any (#2193) Awni Hannun 2025-05-16 08:38:49 -07:00
  • 7d4b378952 Include cuda_bf16.h for bfloat16 overloads (#2192) Cheng 2025-05-16 22:44:42 +09:00
  • 7ff5c41e06 Add set_threadgroup_memory_length to CommandEncoder (#2183) Jack Wind 2025-05-16 03:28:03 -04:00
  • 602f43e3d1 fix conv grad (#2187) Awni Hannun 2025-05-15 19:20:36 -07:00
  • a2cadb8218 real and imag properties (#2189) Awni Hannun 2025-05-15 18:17:50 -07:00
  • c1eb9d05d9 non-symmetric eig and eigh (#2188) Awni Hannun 2025-05-15 13:01:44 -07:00
  • cf6c939e86 Fix some complex vjps (#2178) Angelos Katharopoulos 2025-05-14 23:37:12 -07:00
  • 130df35e1b Add random normal distribution for complex numbers (#2182) Angelos Katharopoulos 2025-05-13 22:43:45 -07:00
  • 0751263dec Fix typo in row_reduce_small (#2179) Cheng 2025-05-14 12:19:54 +09:00
  • eca2f3eb97 Add remove_index utility (#2173) Cheng 2025-05-14 09:09:56 +09:00
  • 3aa9cf3f9e Fix put_along_axis for empty arrays (#2181) Angelos Katharopoulos 2025-05-13 14:27:53 -07:00
  • 8f3d208dce Close a couple edge case bugs: hadamard and addmm on empty inputs (#2177) Awni Hannun 2025-05-12 10:48:57 -07:00
  • caaa3f1f8c Small typos in mx.metal deprecations (#2176) Ivan Fioravanti 2025-05-11 15:03:47 +02:00
  • 659a51919f patch bump (#2162) v0.25.2 Awni Hannun 2025-05-09 14:35:14 -07:00
  • 6661387066 Fix fft for integer overflow (#2161) Awni Hannun 2025-05-09 14:25:12 -07:00
  • a7fae8a176 fix: conv_general differences between gpu, cpu (#2070) ATurker 2025-05-09 20:26:52 +03:00
  • 83762691ba Fix four step fft fft Angelos Katharopoulos 2025-05-08 14:14:59 -07:00
  • 2a41caa00e Add single kernel bluestein Angelos Katharopoulos 2025-05-08 13:15:20 -07:00
  • 6593281d25 Refactored four-step Angelos Katharopoulos 2025-05-08 00:25:38 -07:00
  • da98e8bce8 Refactored stockham Angelos Katharopoulos 2025-05-06 21:46:21 -07:00
  • be57a16a80 More tmp fft changes Angelos Katharopoulos 2025-04-30 22:29:22 -07:00
  • 1704809f29 Tmp FFT commit Angelos Katharopoulos 2025-04-30 15:12:39 -07:00
  • 0cae0bdac8 CUDA backend: backbone (#2075) Cheng 2025-05-07 13:26:46 +09:00
  • 7c99acb799 split logsumexp split_logsumexp Awni Hannun 2025-05-06 17:10:14 -07:00
  • 5a1a5d5ed1 fix input coherent kernel launch (#2153) Awni Hannun 2025-05-05 17:30:50 -07:00
  • 1683975acf Move common gpu primitives to backend/gpu (#2145) Cheng 2025-05-06 05:45:29 +09:00
  • af705590ac fix batched vector sdpa (#2152) Awni Hannun 2025-05-05 13:13:03 -07:00
  • 825124af8f fix bw for elementwise ops (#2151) Awni Hannun 2025-05-05 06:15:04 -07:00
  • 9c5e7da507 fix compile merging (#2150) Awni Hannun 2025-05-02 15:08:50 -07:00
  • 481349495b GPU Hadamard for large N (#1879) Angelos Katharopoulos 2025-02-18 13:43:09 -08:00
  • 9daa6b003f fix shapeless export (#2148) Awni Hannun 2025-05-01 15:02:02 -07:00
  • a3a632d567 Fix the launcher when ran locally (#2147) Angelos Katharopoulos 2025-05-01 12:56:09 -07:00
  • e496c5a4b4 fix integer overflow in qmm (#2143) Awni Hannun 2025-04-30 09:28:56 -07:00
  • ea890d8710 Remove metal-only tests (#2139) Cheng 2025-05-01 01:08:39 +09:00
  • aa5d84f102 Allow quant layer to be unfrozen (#2142) Awni Hannun 2025-04-30 09:08:29 -07:00
  • f1606486d2 Generalize gpu backend (#2138) Awni Hannun 2025-04-30 09:08:17 -07:00
  • 87720a8908 Fix building with uv (#2141) Cheng 2025-04-30 22:04:07 +09:00
  • bb6565ef14 add fftshift and ifftshift fft helpers (#2135) Aashiq Dheeraj 2025-04-30 01:13:45 -04:00
  • 7bb063bcb3 Enable vjp for quantized scale and bias (#2129) Awni Hannun 2025-04-29 13:03:09 -07:00
  • b36dd472bb return library if it is successfully loaded (#2131) Alex Chi Z. 2025-04-29 10:30:36 -04:00
  • 167b759a38 Fix typos (#2136) hdeng-apple 2025-04-29 22:26:05 +08:00
  • 998404ada4 Get trellis to run trellis-quants Awni Hannun 2025-04-26 07:02:20 -07:00
  • 99b9868859 Clarify dimension notation in conv1d, conv2d, and conv3d docstrings (#2123) charan-003 2025-04-25 13:18:30 -06:00
  • 6b2d5448f2 Fix the error message in mx.right_shift and mx.left_shift (#2121) 1ndig0 2025-04-26 00:14:28 +08:00
  • eaf709b83e patch (#2119) v0.25.1 Awni Hannun 2025-04-24 16:11:07 -07:00
  • f0e70afff0 Fix swift pm load (#2117) Angelos Katharopoulos 2025-04-24 10:58:29 -07:00
  • 86984cad68 Remove static initializers (#2059) hdeng-apple 2025-04-24 21:14:49 +08:00
  • fbc89e3ced fix pinv (#2110) Awni Hannun 2025-04-23 13:08:28 -07:00
  • 38c1e720c2 Search mlx.metallib in macOS framework "Resources" dir (#2061) hdeng-apple 2025-04-24 00:53:13 +08:00
  • 600e87e03c Added output_padding parameters in conv_transpose (#2092) Param Thakkar 2025-04-23 21:56:33 +05:30
  • 3836445241 Add broadcast_shapes in python API (#2091) Hyunsung Lee 2025-04-23 10:57:39 +09:00
  • 1d2c9d6a07 Complex scan (#2094) Yury Popov 2025-04-23 04:56:28 +03:00
  • e8ac6bd2f5 irfft throws instead of segfaults on scalars (#2109) Awni Hannun 2025-04-22 10:25:55 -07:00
  • 11f73d6e89 Double buffer keys for vector sdpa sdpa-test Angelos Katharopoulos 2025-04-22 00:19:11 -07:00
  • fdadc4f22c Add more complex unary ops (#2101) Awni Hannun 2025-04-21 13:04:54 -07:00
  • 79b527f45f conv vmap (#2102) Awni Hannun 2025-04-21 13:04:39 -07:00
  • dc4eada7f0 Use unordered map for kwargs in export/import (#2087) Awni Hannun 2025-04-21 07:17:22 -07:00
  • 70ebc3b598 Return const ref in array::data_shared_ptr (#2100) Cheng 2025-04-21 22:17:09 +08:00
  • b13f2aed16 Introduce macros for dispatching dynamic dtypes as static types (#2073) Cheng 2025-04-19 21:16:30 +08:00
  • 5f04c0f818 Fixed shift operations issue (#2080) Param Thakkar 2025-04-19 02:58:33 +05:30
  • 55935ccae7 fix py gc edge case (#2079) Awni Hannun 2025-04-18 12:46:53 -07:00
  • b529515eb1 minor bump (#2081) v0.25.0 Awni Hannun 2025-04-17 14:57:11 -07:00
  • 3cde719eb7 Route to gather qmm only for many tokens per expert (#2082) Angelos Katharopoulos 2025-04-17 14:53:08 -07:00
  • 5de6d94a90 Gather qmm batched kernel and refactoring of quantized (#2078) Angelos Katharopoulos 2025-04-17 13:53:11 -07:00
  • 4c46e17a5d Update benchmark output steel-refactor Jagrit Digani 2025-04-15 10:50:06 -07:00
  • 99eefd2ec0 Gather mm new kernel and small refactoring (#2040) Angelos Katharopoulos 2025-04-14 16:37:36 -07:00
  • e3d275bc49 rebase on main Alex Barron 2025-04-14 16:37:23 -07:00
  • d7acf59fd0 add trellis quant mode Alex Barron 2025-03-18 18:52:22 -07:00
  • e9e268336b LogCumSumExp (#2069) Yury Popov 2025-04-13 11:27:29 +03:00
  • 7275ac7523 Fix release build (#2072) Awni Hannun 2025-04-12 20:41:58 -07:00
  • c4189a38e4 Add float mask to sdpa vector (#2068) Angelos Katharopoulos 2025-04-11 17:29:40 -07:00
  • 68d1b3256b nit: fix exception handling (#2066) Awni Hannun 2025-04-11 14:12:08 -07:00
  • 9c6953bda7 Fix stubgen (#2065) Awni Hannun 2025-04-11 12:02:54 -07:00
  • ef7ece9851 fix fft bug (#2062) Awni Hannun 2025-04-10 19:41:27 -07:00
  • ddaa4b7dcb Fix the test and add custom min/max reductions for uncommon MPI types (#2060) Angelos Katharopoulos 2025-04-10 17:01:17 -07:00
  • dfae2c6989 Fix MSVC build due to use of M_LN2 (#2058) Cheng 2025-04-10 23:41:41 +09:00
  • 515f104926 Min / max reductions (#2041) Anastasiia Filippova 2025-04-10 08:22:20 +02:00
  • 9ecefd56db Do not load the default lib if another is requested (#2055) Angelos Katharopoulos 2025-04-09 13:31:38 -07:00
  • e5d35aa187 no sdpa in grad (#2054) Awni Hannun 2025-04-08 19:13:54 -07:00