Commit Graph

  • fb4e8b896b patch bump (#2343) v0.26.3 Awni Hannun 2025-07-08 14:26:07 -07:00
  • 2ca533b279 Fix compilation with CUDA 11 (#2331) Cheng 2025-07-08 12:00:43 +09:00
  • 4a9b29a875 MoE backward improvements (#2335) Angelos Katharopoulos 2025-07-07 17:59:53 -07:00
  • a4fcc893cd auto build linux release (#2341) Awni Hannun 2025-07-07 09:29:23 -07:00
  • 9d10239af7 [CUDA] Do vectorized store/load in binary ops (#2330) Cheng 2025-07-08 00:44:14 +09:00
  • 19facd4b20 Build with all cpu cores by default (#2336) Cheng 2025-07-07 22:06:45 +09:00
  • f5299f72cd Fix layernorm race condition (#2340) Angelos Katharopoulos 2025-07-07 06:06:01 -07:00
  • 0e0d9ac522 [CUDA] Add MLX_CUDA_GRAPH_CACHE_SIZE env for setting graph cache size (#2329) Cheng 2025-07-06 00:33:29 +09:00
  • 8917022deb fix graphs for older cuda (#2328) Awni Hannun 2025-07-02 19:37:58 -07:00
  • ec0d5db67b [CUDA] Switch to CUDA graphs (#2317) Awni Hannun 2025-07-02 15:59:13 -07:00
  • e76e9b87f0 Fix compilation error from integral_constant (#2326) Cheng 2025-07-02 22:04:38 +09:00
  • cfb6a244ea allow parameters to be deleted (#2325) Awni Hannun 2025-07-01 21:27:23 -07:00
  • 58f3860306 patch bump (#2324) v0.26.2 Awni Hannun 2025-07-01 12:12:16 -07:00
  • dd4f53db63 use fp32 for testing, add more complex ops (#2322) Awni Hannun 2025-07-01 07:30:00 -07:00
  • 3d5e17e507 MLX_SWITCH macros to templates (#2320) Angelos Katharopoulos 2025-07-01 01:33:44 -07:00
  • 33bf1a244b Fix module update in strict mode (#2321) Awni Hannun 2025-06-29 11:12:29 -07:00
  • 772f471ff2 [CUDA] Fix reductions (#2314) Angelos Katharopoulos 2025-06-27 12:59:20 -07:00
  • 2c11d10f8d Split broadcast so it is always fused in compile (#2318) Angelos Katharopoulos 2025-06-26 22:08:18 -07:00
  • 656ed7f780 Fix get 2d grid dims (#2316) Angelos Katharopoulos 2025-06-25 13:03:09 -07:00
  • 81bb9a2a9e Compile float64 functions on CPU (#2311) Awni Hannun 2025-06-24 10:18:52 -07:00
  • 5adf185f86 Fix update_modules() when providing a subset (#2308) Angelos Katharopoulos 2025-06-20 17:19:46 -07:00
  • c9a9180584 Cuda perf tuning (#2307) Awni Hannun 2025-06-20 14:50:57 -07:00
  • 76831ed83d Build CUDA release in Circle (#2306) Awni Hannun 2025-06-19 15:26:36 -07:00
  • b3d7b85376 Make ptx cache settable by environment variable (#2304) Angelos Katharopoulos 2025-06-17 23:55:56 -07:00
  • cad5c0241c [CUDA] synch properly waits for all tasks to finish and clear (#2303) Awni Hannun 2025-06-17 12:03:25 -07:00
  • b8022c578a divmod, partition, sort fixes (#2302) Awni Hannun 2025-06-16 18:49:32 -07:00
  • 870208eff5 Start sdpa vector cuda-sdpa-vector Angelos Katharopoulos 2025-06-15 21:58:34 -07:00
  • bc53f8293f Cuda bug fixes 2 (#2298) Awni Hannun 2025-06-16 13:14:46 -07:00
  • c552ff2451 [CUDA] Fix back-end bugs and enable corresponding tests (#2296) Awni Hannun 2025-06-16 08:45:40 -07:00
  • 4fda5fbdf9 add python testing for cuda with ability to skip list of tests (#2295) Awni Hannun 2025-06-15 10:56:48 -07:00
  • 580776559b RoPE for CUDA (#2293) Angelos Katharopoulos 2025-06-15 06:08:07 -07:00
  • a14aaa7c9d Fix cuda arg reduce (#2291) Awni Hannun 2025-06-14 17:54:00 -07:00
  • a6d780154f fix cuda gemm for bf16 (#2288) Awni Hannun 2025-06-13 22:10:46 -07:00
  • 6871e2eeb7 fix cuda jit (#2287) Awni Hannun 2025-06-13 19:21:46 -07:00
  • 8402a2acf4 Fix complex power and print (#2286) Awni Hannun 2025-06-13 11:13:00 -07:00
  • fddb6933e1 Collection of refactors (#2274) Jagrit Digani 2025-06-13 10:44:56 -07:00
  • c8b4787e4e CUDA backend: indexing ops (#2277) Cheng 2025-06-13 13:44:19 +09:00
  • 2188199ff8 [CUDA] ternary with select op (#2283) Awni Hannun 2025-06-12 20:24:43 -07:00
  • aa07429bad Fix cuda build (#2284) Awni Hannun 2025-06-12 17:48:05 -07:00
  • 918761a25a [CUDA] RMSNorm and VJP (#2280) Awni Hannun 2025-06-12 17:09:49 -07:00
  • a4fc671d3e CUDA backend: compile (#2276) Cheng 2025-06-13 09:08:39 +09:00
  • f5f65ef48c Make sliceUpdate general (#2282) Awni Hannun 2025-06-12 16:48:54 -07:00
  • c2dd81a8aa Fix warnings from latest CUDA toolkit (#2275) Cheng 2025-06-12 22:03:01 +09:00
  • d7e680ffe4 CUDA backend: layernorm (#2271) Cheng 2025-06-12 07:48:32 +09:00
  • c371baf53a CUDA backend: softmax (#2272) Cheng 2025-06-12 05:55:22 +09:00
  • ccf78f566c CUDA backend: argreduce (#2270) Cheng 2025-06-12 05:26:17 +09:00
  • c9fa68664a CUDA backend: reduce (#2269) Cheng 2025-06-12 03:22:25 +09:00
  • c35f4d089a start cuda circle config (#2256) Awni Hannun 2025-06-10 21:19:47 -07:00
  • 8590c0941e Add load_safe to the general conv loaders (#2258) Angelos Katharopoulos 2025-06-10 20:58:16 -07:00
  • 095163b8d1 Fix building cpp benchmarks on Linux (#2268) Cheng 2025-06-11 09:10:24 +09:00
  • 99c33d011d rebase + nit (#2260) Cheng 2025-06-11 02:51:51 +09:00
  • 62fecf3e13 fix conv export (#2265) Awni Hannun 2025-06-10 09:34:01 -07:00
  • 7c4eb5d03e CUDA backend: random (#2261) Cheng 2025-06-11 00:59:56 +09:00
  • bae9a6b404 CUDA backend: sort (#2262) Cheng 2025-06-11 00:59:47 +09:00
  • 004c1d8ef2 Report number of missing parameters (#2264) Christopher Fleetwood 2025-06-10 14:37:50 +01:00
  • 7ebb2e0193 CUDA backend: binary ops (#2259) Cheng 2025-06-10 22:37:40 +09:00
  • 9ce77798b1 fix export to work with gather/scatter axis (#2263) Awni Hannun 2025-06-09 20:37:27 -07:00
  • f8bad60609 CUDA backend: unary ops (#2158) Cheng 2025-06-09 22:45:08 +09:00
  • 5866b3857b Refactor the lu test (#2250) Emmanuel Ferdman 2025-06-07 16:12:08 +03:00
  • 1ca616844b Fix unintuitive metal kernel caching (#2242) Awni Hannun 2025-06-06 20:08:15 -07:00
  • 2e8cf0b450 Change layernorms to two pass algorithm (#2246) Angelos Katharopoulos 2025-06-06 13:34:56 -07:00
  • 24f89173d1 CUDA backend: matmul (#2241) Cheng 2025-06-07 04:24:04 +09:00
  • c6a20b427a Improve metal elementwise kernels (#2247) Awni Hannun 2025-06-06 11:37:40 -07:00
  • a5ac9244c4 fix linux linking error (#2248) Awni Hannun 2025-06-06 10:41:51 -07:00
  • c763fe1be0 default strict mode for module update and update_modules (#2239) Awni Hannun 2025-06-05 15:27:02 -07:00
  • 52dc8c8cd5 Add profiler annotations in common primitives for CUDA backend (#2244) Cheng 2025-06-05 11:55:12 +09:00
  • aede70e81d Perf regression fix (#2243) v0.26.1 Angelos Katharopoulos 2025-06-03 17:55:12 -07:00
  • 85a8beb5e4 Avoid atomic updates across CPU/GPU in CUDA event (#2231) Cheng 2025-06-04 08:49:06 +09:00
  • 0bb89e9e5f Share more common code in Compiled (#2240) Cheng 2025-06-04 08:48:50 +09:00
  • 5685ceb3c7 Avoid invoking allocator::malloc when creating CUDA event (#2232) Cheng 2025-06-04 08:48:40 +09:00
  • 0408ba0a76 Optimizing Complex Matrix Multiplication using Karatsuba’s Algorithm (#2220) v0.26.0 Suryash Malviya 2025-06-02 18:58:46 -04:00
  • cbad6c3093 version (#2237) Awni Hannun 2025-06-02 15:58:33 -07:00
  • 1b021f6984 Fast primitives decide when to use the fallback (#2216) Cheng 2025-06-03 05:26:37 +09:00
  • 95b7551d65 Do not check event.is_signaled() in eval_impl (#2230) Cheng 2025-06-03 05:23:34 +09:00
  • db5a7c6192 Add memory cache to CUDA backend (#2221) Cheng 2025-05-31 04:12:54 +09:00
  • 6ef2f67e7f 5bit quants (#2226) Awni Hannun 2025-05-30 12:12:10 -07:00
  • f76ee1ffd2 Move some dims utils to common (#2223) Cheng 2025-05-29 22:48:30 +09:00
  • 54a71f270a Remove unused defines (#2217) Cheng 2025-05-23 22:14:58 +09:00
  • 55b4062dd8 copyright in docs (#2214) Awni Hannun 2025-05-21 17:13:04 -07:00
  • 79071bfba4 Fix out-of-bounds default value in logsumexp/softmax (#2213) Cheng 2025-05-21 23:25:16 +09:00
  • 7774b87cbd Remove redundant simd_sum in logsumexp (#2210) Cheng 2025-05-21 23:25:03 +09:00
  • 35c87741cf Build for compute capability 70 instead of 75 (#2209) Cheng 2025-05-21 11:42:48 +09:00
  • 4cbe605214 Feat: Allow per-target Metal debug flags (#2201) Jack Wind 2025-05-20 13:22:26 -04:00
  • ab8883dd55 include mlx::core::version() symbols in the mlx static library (#2207) Clement Liaw 2025-05-20 07:39:11 -07:00
  • eebe73001a fix large arg reduce (#2206) Awni Hannun 2025-05-19 13:10:44 -07:00
  • 0359bf02c9 Nearest upsample (#2202) Angelos Katharopoulos 2025-05-19 11:23:38 -07:00
  • 237f9e58a8 Fix BEFORE keyword in target_include_directories (#2204) Cheng 2025-05-19 22:10:44 +09:00
  • 8576e6fe36 fix conv2d bug + faster conv 1d (#2195) Awni Hannun 2025-05-18 06:05:11 -07:00
  • 0654543dcc Add complex eigh (#2191) Angelos Katharopoulos 2025-05-18 00:18:43 -07:00
  • 48ef3e74e2 reduce vjp for all and any (#2193) Awni Hannun 2025-05-16 08:38:49 -07:00
  • 7d4b378952 Include cuda_bf16.h for bfloat16 overloads (#2192) Cheng 2025-05-16 22:44:42 +09:00
  • 7ff5c41e06 Add set_threadgroup_memory_length to CommandEncoder (#2183) Jack Wind 2025-05-16 03:28:03 -04:00
  • 602f43e3d1 fix conv grad (#2187) Awni Hannun 2025-05-15 19:20:36 -07:00
  • a2cadb8218 real and imag properties (#2189) Awni Hannun 2025-05-15 18:17:50 -07:00
  • c1eb9d05d9 non-symmetric eig and eigh (#2188) Awni Hannun 2025-05-15 13:01:44 -07:00
  • cf6c939e86 Fix some complex vjps (#2178) Angelos Katharopoulos 2025-05-14 23:37:12 -07:00
  • 130df35e1b Add random normal distribution for complex numbers (#2182) Angelos Katharopoulos 2025-05-13 22:43:45 -07:00
  • 0751263dec Fix typo in row_reduce_small (#2179) Cheng 2025-05-14 12:19:54 +09:00
  • eca2f3eb97 Add remove_index utility (#2173) Cheng 2025-05-14 09:09:56 +09:00
  • 3aa9cf3f9e Fix put_along_axis for empty arrays (#2181) Angelos Katharopoulos 2025-05-13 14:27:53 -07:00