Commit Graph

  • 1a9f820af6 Compiled should not end in broadcast (#2622) Angelos Katharopoulos 2025-09-26 13:36:09 -07:00
  • d4f4ff3c5e Allow None input to compiled functions (#2621) Awni Hannun 2025-09-25 08:42:23 -07:00
  • 7c7e48dbd1 New tuning for small K gemv (#2620) Jagrit Digani 2025-09-23 12:28:35 -07:00
  • fbbf3b9b3e Support pickling array for bfloat16 (#2586) Daniel Yeh 2025-09-23 05:12:15 +02:00
  • bf01ad9367 fix (#2613) Daniel Yeh 2025-09-23 05:12:04 +02:00
  • ae438d05fa [CUDA] Recycle CUDA events (#2604) Cheng 2025-09-23 10:42:03 +09:00
  • 711a645807 avoid producing NaN in attention (#2608) Awni Hannun 2025-09-22 13:10:43 -07:00
  • aa9d44b3d4 implement Convolution::output_shape (#2601) Josh Bleecher Snyder 2025-09-22 10:09:45 -07:00
  • ec2ab42888 Lower sorted QMM gather threshold (#2609) Awni Hannun 2025-09-19 18:22:55 -07:00
  • 787c0d90cd Detect cache thrashing in LRUCache (#2600) Cheng 2025-09-19 09:12:14 +09:00
  • e8b604a6a3 fix: library loading for swift dynamic frameworks (#2568) Oleksandr Bilous 2025-09-18 23:54:59 +03:00
  • 50cc09887f expose depends (#2606) Awni Hannun 2025-09-18 10:06:15 -07:00
  • 3f730e77aa Update export function example for array input (#2598) Umberto Mignozzetti 2025-09-16 14:38:05 -07:00
  • caecbe876a no copy batch rope (#2595) Awni Hannun 2025-09-15 14:23:48 -07:00
  • 8afb6d62f2 Fix typo in average_gradients function call (#2594) Umberto Mignozzetti 2025-09-15 11:29:21 -07:00
  • 6ccfa603cd fix metal scan (#2591) Awni Hannun 2025-09-15 11:01:57 -07:00
  • 36cad99a11 Refactor code examples to use 'gelu' (#2592) Umberto Mignozzetti 2025-09-15 09:47:02 -07:00
  • ee18e1cbf0 patch bump (#2588) v0.29.1 Awni Hannun 2025-09-11 17:10:09 -07:00
  • af120c2bc0 set nccl ABI version (#2587) Awni Hannun 2025-09-11 16:55:53 -07:00
  • 6a3acf2301 [CUDA] Set bias as input when using bias epilogue (#2584) Cheng 2025-09-11 15:31:09 +09:00
  • d6977f2a57 Add sdpa with sinks (#2558) Awni Hannun 2025-09-10 14:53:00 -07:00
  • db5443e831 Adding Relu2 (#2582) Gökdeniz Gülmez 2025-09-10 16:24:30 +02:00
  • 52b8384d10 Fix flaky addmm tests (#2581) Cheng 2025-09-10 14:22:22 +09:00
  • 44cc5da4bc [CUDA] Fix alpha not respected when using bias epilogue (#2578) Cheng 2025-09-10 09:08:01 +09:00
  • dde3682b69 [CUDA] Use GEMM with epilogue instead of AddMM (#2569) Cheng 2025-09-09 13:18:49 +09:00
  • 17310d91a6 Add batch offsets for mx.fast.rope (#2564) Awni Hannun 2025-09-08 17:35:07 -07:00
  • b194d65a6a Some tweaks in cmake files (#2574) Cheng 2025-09-09 08:27:18 +09:00
  • a44b27f5f8 Fix a few ccache cache miss (#2573) Cheng 2025-09-09 07:41:05 +09:00
  • e5a33f2223 faster depthwise 1D conv (#2567) Awni Hannun 2025-09-08 11:37:23 -07:00
  • c1e3340b23 Set ccache size before building (#2570) Cheng 2025-09-07 09:00:31 +09:00
  • 8f163a367d typing: add type hints to mlx.core.array, linalg, distributed, and random (#2565) XXXXRT666 2025-09-05 00:08:11 +08:00
  • 89a3df9014 Fixed several type annotations in the MLX stubs which degraded to Unknown/Any (#2560) Manuel Villanueva 2025-09-03 14:52:08 -05:00
  • c5d2937aa5 chore: Update Docs With Slice Copy Example (#2559) Krishi Saripalli 2025-09-02 22:07:02 -07:00
  • b61a65e313 fix copies in sdpa (#2563) Awni Hannun 2025-09-02 11:00:36 -07:00
  • 04cbb4191c Fix dequantize python sig (#2562) wrmsr 2025-09-01 11:50:20 -07:00
  • c5460762e7 Fix AdamW weight_decay default value in docstring (#2557) Artur Antonov 2025-09-01 07:29:30 +03:00
  • 8ce49cd39e fix quantized vjp for mxfp4 (#2555) v0.29.0 Awni Hannun 2025-08-29 10:06:15 -07:00
  • 9c68b50853 version bump (#2554) Awni Hannun 2025-08-29 06:54:17 -07:00
  • 111f1e71af Faster contiguous gather for indices in the first axis (#2552) Awni Hannun 2025-08-28 21:26:30 -07:00
  • 827003d568 fix METAL quantization in JIT (#2553) Awni Hannun 2025-08-28 18:26:25 -07:00
  • d363a76aa4 Bump xcode in circle (#2551) Awni Hannun 2025-08-28 13:13:34 -07:00
  • 70560b6bd5 Add mode parameter for quantization (#2499) Awni Hannun 2025-08-28 06:45:26 -07:00
  • 7ef8a6f2d5 [CUDA] fix sort (#2550) Awni Hannun 2025-08-27 19:48:43 -07:00
  • 31c6f6e33f [CUDA] Use ConcurrentContext in concatenate_gpu (#2549) Cheng 2025-08-28 09:30:08 +09:00
  • 584d48458e link with nccl (#2546) Awni Hannun 2025-08-27 10:01:07 -07:00
  • 5cf984ca87 Separate cpu compilation cache by versions (#2548) Cheng 2025-08-27 11:25:15 +09:00
  • a9bac3d9e5 Run CPP tests for CUDA build in CI (#2544) Cheng 2025-08-27 08:06:46 +09:00
  • 5458d43247 add load with path tests (#2543) Awni Hannun 2025-08-26 14:24:47 -07:00
  • a4dba65220 Enable cuda graph toggle (#2545) Awni Hannun 2025-08-26 12:50:38 -07:00
  • 4987e7615a Improve the cutlass gemm simple-gemm Angelos Katharopoulos 2025-08-25 18:18:19 -07:00
  • 3dcb286baf Remove stream from average grads so it uses default (#2532) Awni Hannun 2025-08-25 15:56:29 -07:00
  • 4822c3dbe9 [CUDA] Implement DynamicSlice/DynamicSliceUpdate (#2533) Cheng 2025-08-26 07:31:39 +09:00
  • 2ca75bb529 Remove nccl install in release (#2542) Awni Hannun 2025-08-25 15:20:18 -07:00
  • db14e29a0b allow pathlib.Path to save/load functions (#2541) Awni Hannun 2025-08-25 14:58:49 -07:00
  • d2f540f4e0 Use nccl header only when nccl is not present (#2539) Awni Hannun 2025-08-25 14:17:25 -07:00
  • 333ffea273 [CUDA] Remove thrust in arange (#2535) Cheng 2025-08-24 16:22:36 +09:00
  • f55b6f1f2f Enable COMPILE_WARNING_AS_ERROR for linux builds in CI (#2534) Cheng 2025-08-24 15:33:08 +09:00
  • 30561229c7 Fix allocation bug in NCCL (#2530) Awni Hannun 2025-08-22 14:39:43 -07:00
  • 068a4612e9 nccl default for backend=any (#2528) Awni Hannun 2025-08-22 12:24:27 -07:00
  • 5722c147de [CUDA] Update calls to cudaMemAdvise and cudaGraphAddDependencies for CUDA 13 (#2525) Andrey Portnoy 2025-08-21 22:57:20 -04:00
  • f6819a1f26 Fix warning 186-D from nvcc (#2527) Cheng 2025-08-22 10:29:55 +09:00
  • f93f87c802 nccl dep + default for cuda (#2526) Awni Hannun 2025-08-21 17:57:49 -07:00
  • 9392fc3f88 NCCL backend (#2476) Anastasiia Filippova 2025-08-21 20:56:15 +02:00
  • e843c4d8d5 fix power (#2523) Awni Hannun 2025-08-21 06:46:01 -07:00
  • e1303f6160 Reset cutlass gemm to working state again Angelos Katharopoulos 2025-08-21 01:29:43 -07:00
  • cf5eef095d tmp Angelos Katharopoulos 2025-08-14 12:29:53 -07:00
  • 395d582719 Add a cutlass gemm Angelos Katharopoulos 2025-08-09 22:47:14 -07:00
  • 05583bcd10 More pipelining for the sm_80 gemm Angelos Katharopoulos 2025-08-09 22:46:31 -07:00
  • 6fce01593a Improve gemm Angelos Katharopoulos 2025-08-07 16:13:18 -07:00
  • 97afe40b7b Remove duplicate register tile Angelos Katharopoulos 2025-08-07 00:55:08 -07:00
  • f70c62d69c Simple gemm example Angelos Katharopoulos 2025-07-29 18:23:40 -07:00
  • 0c5fc63a36 Fix docs omission (#2524) Angelos Katharopoulos 2025-08-20 17:56:06 -07:00
  • e397177f6e Custom cuda kernel (#2517) Angelos Katharopoulos 2025-08-20 17:20:22 -07:00
  • f4c8888cbe [CUDA] Fix stride of singleton dims before passing to cuDNN (#2521) Cheng 2025-08-21 08:55:26 +09:00
  • 25c1e03205 Fix overflow in large filter small channels (#2520) Angelos Katharopoulos 2025-08-20 08:03:29 -07:00
  • 512281781c Remove state return from function example in compile documentation (#2518) russellizadi 2025-08-20 03:45:05 -04:00
  • ac85ddfdb7 [CUDA] Add GEMM-based fallback convolution kernels (#2511) Cheng 2025-08-20 10:06:22 +09:00
  • 65d0d40232 Split cuDNN helpers into a separate header (#2491) Cheng 2025-08-20 09:29:28 +09:00
  • cea9369610 fix lapack svd (#2515) Awni Hannun 2025-08-18 15:07:59 -07:00
  • e7c6e1db82 no segfault with uninitialized array.at (#2514) Awni Hannun 2025-08-18 08:33:38 -07:00
  • c5fcd5b61b fix custom kernel test (#2510) Awni Hannun 2025-08-18 06:45:59 -07:00
  • 1df9887998 Ensure no oob read in gemv_masked (#2508) Angelos Katharopoulos 2025-08-17 08:42:33 -07:00
  • 73f22d6226 Ensure small sort doesn't use indices if not argsort (#2506) Angelos Katharopoulos 2025-08-17 08:42:20 -07:00
  • c422050ca7 Update cuDNN Frontend to v1.14 (#2505) Cheng 2025-08-17 19:13:01 +09:00
  • 1ba18ff7d9 [CUDA] Fix conv grads with groups (#2495) Cheng 2025-08-16 10:09:18 +09:00
  • 37b440faa8 Clean up code handling both std::vector and SmallVector (#2493) Cheng 2025-08-16 09:01:10 +09:00
  • 888b13ed63 Remove the hack around SmallVector in cpu compile (#2494) Cheng 2025-08-16 08:17:24 +09:00
  • 4abb218d21 The naive_conv_2d is no longer used (#2496) Cheng 2025-08-16 07:57:30 +09:00
  • 6441c21a94 Faster general unary op (#2472) Awni Hannun 2025-08-15 15:04:12 -07:00
  • 400f8457ea Experimenting with a gemm based on the cuda steel utils jagrit06/cuda-gemm-experiment Jagrit Digani 2025-08-14 11:27:50 -07:00
  • dfb5022eab Rename cu::Matmul to CublasGemm (#2488) Cheng 2025-08-13 09:37:40 +09:00
  • ac207ce7aa make code blocks copyable (#2480) Daniel Yeh 2025-08-12 21:29:02 +02:00
  • fce53b61d6 Fix reduce sum/prod overflow (#2477) Abe Leininger 2025-08-12 02:05:33 -05:00
  • 8ae4a76308 Use CMake <4.1 to avoid the nvpl error (#2489) Angelos Katharopoulos 2025-08-12 00:03:42 -07:00
  • 7fde1b6a1e Fix logsumexp/softmax not fused for some cases (#2474) Cheng 2025-08-09 06:07:17 +09:00
  • aa7b47481a [CUDA] Optimize set_mm_device_pointers for small ndim (#2473) Cheng 2025-08-08 15:23:30 +09:00
  • 56be773610 version (#2470) v0.28.0 Awni Hannun 2025-08-07 00:36:04 -07:00
  • a9bdd67baa Add CUDA sdpa vector (#2468) Jagrit Digani 2025-08-06 21:40:26 -07:00
  • a22d0bf273 Add stricter condition to matrix sdpa sdpav-backup Angelos Katharopoulos 2025-08-06 19:51:14 -07:00
  • f2adb5638d Fix typo in metal command encoder (#2471) Angelos Katharopoulos 2025-08-06 16:58:23 -07:00