Commit Graph

  • 5b6b22cc60 Test on cuda 12.2 and 12.9 Awni Hannun 2025-07-23 20:12:38 -0700
  • 4e504039f5
    [Metal] Release metal events (#2412) Awni Hannun 2025-07-23 19:53:42 -0700
  • 94b0f83e19 Skip TestConv.test_torch_conv_2D test Cheng 2025-07-24 02:25:45 +0000
  • ada7d518da Use tf32 for conv Cheng 2025-07-22 18:16:24 -0700
  • f189face9d Zero-initilizing array Cheng 2025-07-21 17:40:33 -0700
  • 48e796bb91 Do error check for cublas handle Cheng 2025-07-22 00:28:43 +0000
  • 4c0dc7745f Make LRUCache more like a normal container Cheng 2025-07-20 16:20:53 -0700
  • 3d16cb5071 Set cudnn stream before execution Cheng 2025-07-20 05:24:48 -0700
  • 67a5f7b2a8 Test the native cuda graph api Cheng 2025-07-20 03:40:15 -0700
  • 85510dae78 Add cache Cheng 2025-07-20 03:12:57 -0700
  • 0430a6a74a Turn off tf32 Cheng 2025-07-19 19:49:29 -0700
  • 6444b29651 Plan needs to be kept alive Cheng 2025-07-19 19:09:46 -0700
  • c6076fc77b Switch to backend apis Cheng 2025-07-19 23:52:19 +0000
  • bb6a75bc4a cudnn only accepts contiguous inputs Cheng 2025-07-19 08:11:15 +0000
  • fecc67509d Install libcudnn9-dev-cuda-12 in CI Cheng 2025-07-18 23:00:55 +0000
  • 75bcb46069 include cudnn as python dep Awni Hannun 2025-07-18 06:54:41 -0700
  • 180ec0d3a5 Fix C++ conv tests Cheng 2025-07-18 01:29:38 -0700
  • cea3af6622 More unused backend apis Cheng 2025-07-18 00:18:36 -0700
  • ae9dbb1a9b Fix recording cudnn conv Cheng 2025-07-17 23:48:37 -0700
  • 6571df6ad7 Remove backend apis Cheng 2025-07-17 23:46:09 +0000
  • ad44c4bcd9 Initial implementation Cheng 2025-07-17 22:15:15 +0000
  • 04bd515370 Link with cuDNN Cheng 2025-07-17 01:34:12 +0000
  • d1f4d291e8
    Fix uv install and add dev release (#2411) Awni Hannun 2025-07-23 16:54:19 -0700
  • 3119a0944f fix Awni Hannun 2025-07-23 16:52:46 -0700
  • e1840853ce
    full row mask in sdpa consistently gives nan (#2406) Awni Hannun 2025-07-23 16:37:03 -0700
  • b2280a1c41 fix Awni Hannun 2025-07-23 16:36:09 -0700
  • 5a4f375c6c release metal events Awni Hannun 2025-07-23 14:18:24 -0700
  • 4f524e3003 cuda release on cpu-only machine Awni Hannun 2025-07-23 12:37:07 -0700
  • 0d8a7d8248 pin cuda deps Awni Hannun 2025-07-23 10:00:11 -0700
  • 3b9c665cb8 fix docstring Awni Hannun 2025-07-23 07:54:45 -0700
  • e2b35e06a4 fix uv install and add dev release Awni Hannun 2025-07-23 07:48:12 -0700
  • 0f5ce173da
    [CUDA] --compress-mode requires CUDA 12.8 (#2407) Cheng 2025-07-23 22:11:11 +0900
  • 588854195f
    Remove unused code in Convolution::vjp (#2408) Cheng 2025-07-23 22:11:00 +0900
  • 28d068bce6
    Fix an error in the comment for mx.dequantize (#2409) Fangjun Kuang 2025-07-23 21:10:50 +0800
  • 8269c9d02d Support unaligned M qmm Angelos Katharopoulos 2025-07-23 00:40:27 -0700
  • 2fe46d0240
    Fix an error in the comment for mx.dequantize Fangjun Kuang 2025-07-23 15:32:26 +0800
  • 903b40627c Add dynamic shared memory and improve qmm Angelos Katharopoulos 2025-07-22 23:36:53 -0700
  • c7cdd51f50 Improve perf TianyiZhao1437 2025-07-23 10:12:38 +0800
  • 4c43b3a553 Remove unused code in Convolution::vjp Cheng 2025-07-23 10:24:13 +0900
  • bcdca0d372 [CUDA] --compress-mode requires CUDA 12.8 Cheng 2025-07-22 17:43:52 -0700
  • 6dd1a53956 full row mask in sdpa consistently gives nan Awni Hannun 2025-07-22 15:57:46 -0700
  • d107d8d495
    add cuda gemv (#2400) Awni Hannun 2025-07-22 08:24:13 -0700
  • 1e496ddb82
    [CUDA] Simplify allocator (#2392) Awni Hannun 2025-07-22 08:24:01 -0700
  • beef3f42cc add cuda gemv Awni Hannun 2025-07-20 22:06:37 -0700
  • f4556ac385 comment Awni Hannun 2025-07-22 07:18:54 -0700
  • b1a44ef240 comment Awni Hannun 2025-07-22 07:18:28 -0700
  • 4fd39d662d use cuda buffer in small pool Awni Hannun 2025-07-20 07:14:57 -0700
  • 60e20bedb6 Don't use shared event in worker Awni Hannun 2025-07-19 13:35:57 -0700
  • b62368f292 simplify allocator and fixe race with small pool Awni Hannun 2025-07-19 10:23:59 -0700
  • 74eccbf3fa
    use size option in binary (#2399) Awni Hannun 2025-07-22 07:00:53 -0700
  • 08638223ca
    Fix including stubs in wheel (#2398) Awni Hannun 2025-07-22 06:30:17 -0700
  • 7df3a2887d fix mismatch tianyi 2025-07-22 18:00:42 +0800
  • 700f7dcf01 Refactor the matmul a bit Angelos Katharopoulos 2025-07-21 23:38:21 -0700
  • b2f0ebe9ee [Feature]Add no parallel-m qmm kernel to improve decoding performance tianyi 2025-07-22 12:54:04 +0800
  • eb58609614 fix bool_ Awni Hannun 2025-07-21 08:37:56 -0700
  • a6f36cd0fa use size option in binary Awni Hannun 2025-07-21 07:38:50 -0700
  • 6876998955 fix including stubs in wheel Awni Hannun 2025-07-21 07:45:14 -0700
  • 56cc858af9
    Add contiguous_copy_cpu util for copying array (#2397) Cheng 2025-07-21 23:30:35 +0900
  • f55c4ed1d6
    Remove thrust iterators (#2396) Cheng 2025-07-21 23:30:27 +0900
  • 6c60bd1cbf Fixed mma and working dequant Angelos Katharopoulos 2025-07-21 04:39:27 -0700
  • a64cc02a0c Somewhat working matmul primitives Angelos Katharopoulos 2025-07-21 02:22:25 -0700
  • 346ae5fdb5 Refactor quantized Angelos Katharopoulos 2025-07-16 16:22:25 -0700
  • 5f29a087ae Add contiguous_copy_cpu util for copying array Cheng 2025-07-21 09:20:23 +0900
  • 0b6960f24a Remove thrust iterators Cheng 2025-07-20 16:54:26 -0700
  • 93d70419e7
    [CUDA] speedup handling scalars (#2389) Awni Hannun 2025-07-18 21:47:31 -0700
  • 63f663d9c6
    fix cuda manylinux version to match others (#2388) Awni Hannun 2025-07-18 21:02:16 -0700
  • f1ea4213cc comment Awni Hannun 2025-07-18 20:07:49 -0700
  • 6b1d91915c fix cuda manylinux version to match others Awni Hannun 2025-07-18 16:18:04 -0700
  • 4dde2d1b73 speedup scalars in cuda Awni Hannun 2025-07-18 13:00:22 -0700
  • 84b4d96efa
    fix release build + patch bump (#2387) v0.26.5 Awni Hannun 2025-07-18 14:47:37 -0700
  • 748d64a93a fix release build + patch bump Awni Hannun 2025-07-18 13:25:07 -0700
  • aec67f2fa6
    patch bump (#2386) Awni Hannun 2025-07-18 12:25:48 -0700
  • deee214a95
    Adding support for the Muon Optimizer (#1914) Gökdeniz Gülmez 2025-07-18 21:25:28 +0200
  • 8435c047e1 fix addmm Awni Hannun 2025-07-18 08:41:22 -0700
  • d1367b1c78 patch bump Awni Hannun 2025-07-18 06:47:54 -0700
  • 45adec102c
    Add contiguous_copy_gpu util for copying array (#2379) Cheng 2025-07-18 22:44:25 +0900
  • 508bd25e29 match muon Awni Hannun 2025-07-18 06:43:11 -0700
  • 31fc530c76
    [CUDA] Add more ways finding CCCL headers in JIT (#2382) Cheng 2025-07-18 07:25:34 +0900
  • b4c84a3688 Add contiguous_copy_gpu util for copying array Cheng 2025-07-16 20:43:42 +0900
  • 0a8bb904d7 nits Awni Hannun 2025-07-17 11:58:41 -0700
  • c535d8c1b5
    Merge branch 'ml-explore:main' into adding-Muon-optimizer Gökdeniz Gülmez 2025-07-17 20:10:02 +0200
  • 4b3d7634cd format Goekdeniz-Guelmez 2025-07-17 20:03:19 +0200
  • 516d172ba5 remove comments Goekdeniz-Guelmez 2025-07-17 20:02:27 +0200
  • 698daee214 replace with mx.addmm Goekdeniz-Guelmez 2025-07-17 19:57:18 +0200
  • 4c0f7c713b remove coments Goekdeniz-Guelmez 2025-07-17 19:53:56 +0200
  • 3889c805da G.ndim >= 2 to assert G.ndim == 2 Goekdeniz-Guelmez 2025-07-17 19:52:00 +0200
  • 060404d862 G.astype(mx.bfloat16) to G.astype(G.dtype) Goekdeniz-Guelmez 2025-07-17 19:49:26 +0200
  • fbb3f65a1a
    fix resource leaks in matmul and graph (#2383) Awni Hannun 2025-07-17 06:50:15 -0700
  • 6b1b8ea91b
    [CUDA] Add work per thread to compile (#2368) Angelos Katharopoulos 2025-07-17 06:47:52 -0700
  • 7f39e9c299 nits Awni Hannun 2025-07-17 06:26:43 -0700
  • baad6e392b
    Merge branch 'ml-explore:main' into adding-Muon-optimizer Gökdeniz Gülmez 2025-07-17 13:07:54 +0200
  • a2ffc00769 fix resource leaks in matmul and graph Awni Hannun 2025-07-16 22:00:29 -0700
  • 42195892aa [CUDA] Add more ways finding CCCL headers in JIT Cheng 2025-07-17 02:07:17 +0000
  • b24f6f64fd Fix work-per-thread for strided kernels Angelos Katharopoulos 2025-07-14 23:57:23 -0700
  • e74e593948 Typo Angelos Katharopoulos 2025-07-14 00:53:09 -0700
  • bb341d85b5 Fix the template arg name to not clash with inputs Angelos Katharopoulos 2025-07-14 00:03:21 -0700
  • 8b77aa9b8d Add work per thread in compile Angelos Katharopoulos 2025-07-13 23:07:32 -0700
  • b2273733ea
    Test with CUDA 12.2 (#2375) Awni Hannun 2025-07-16 13:00:37 -0700
  • 784e0716fe
    Merge branch 'ml-explore:main' into adding-Muon-optimizer Gökdeniz Gülmez 2025-07-16 21:58:17 +0200
  • 9795e0ae36 fix cpu sort Awni Hannun 2025-07-15 22:10:23 -0700