Commit Graph

  • aaef4327b4 try older image Awni Hannun 2025-07-15 13:19:40 -0700
  • b5f5462cd4 Test with CUDA 12.0 Awni Hannun 2025-07-15 07:22:15 -0700
  • f409b229a4
    fix ring distributed test (#2380) Awni Hannun 2025-07-16 11:25:24 -0700
  • df6d9e972f nits and adding it to test Goekdeniz-Guelmez 2025-07-16 19:13:40 +0200
  • 30571e2326
    Rename the copy util in cpu/copy.h to copy_cpu (#2378) Cheng 2025-07-16 23:34:24 +0900
  • 650c956fe6
    Merge branch 'ml-explore:main' into adding-Muon-optimizer Gökdeniz Gülmez 2025-07-16 16:29:10 +0200
  • 9a85170473 fix ring distributed test Awni Hannun 2025-07-16 07:25:36 -0700
  • 82ff1505f8 Rename the copy util in cpu/copy.h to copy_cpu Cheng 2025-07-16 20:23:32 +0900
  • d7734edd9f
    fix complex reduce + nan propagation in min and max (#2377) Awni Hannun 2025-07-15 18:19:47 -0700
  • a0957f208a fix complex reduce + nan propagation in min and max Awni Hannun 2025-07-15 17:00:07 -0700
  • 2ba69bc8fa
    lower memory uniform sampling (#2361) Awni Hannun 2025-07-15 14:22:07 -0700
  • 2d7e412298 fix Awni Hannun 2025-07-15 13:12:45 -0700
  • cb349a291c
    [CUDA] Use cuda::std::complex in place of cuComplex (#2372) Cheng 2025-07-15 16:36:13 +0900
  • f0ac833e17 [CUDA] Use cuda::std::complex in place of cuComplex Cheng 2025-07-15 01:25:08 +0000
  • f0a0b077a0
    Install linux with mlx[cuda] and mlx[cpu] (#2356) Awni Hannun 2025-07-14 17:17:33 -0700
  • 7fe6f03a5b use fp32 Awni Hannun 2025-07-14 17:02:27 -0700
  • ddd132ca26 lower memory uniform Awni Hannun 2025-07-11 13:39:37 -0700
  • 49114f28ab
    fix flaky test (#2371) Awni Hannun 2025-07-14 17:16:18 -0700
  • 559ccb6acc fix flaky test Awni Hannun 2025-07-14 16:27:25 -0700
  • e7d2ebadd2
    [CUDA] Affine quantize (#2354) Awni Hannun 2025-07-14 15:45:44 -0700
  • e569803d7c
    update linux build (#2370) Awni Hannun 2025-07-14 15:13:56 -0700
  • 51854d1a19 format Awni Hannun 2025-07-10 14:34:01 -0700
  • 73bb93318f fix Awni Hannun 2025-07-10 14:33:44 -0700
  • e4a3be4411 format Awni Hannun 2025-07-10 11:05:05 -0700
  • 6896043fdd affine quantize and dequantize kernels Awni Hannun 2025-07-10 11:03:11 -0700
  • fab85a9f72 update linux build Awni Hannun 2025-07-14 14:18:28 -0700
  • d34f887abc
    Add Primitive::name and remove Primitive::print (#2365) Cheng 2025-07-15 06:06:35 +0900
  • 5201df5030
    Fix imag() vjp (#2367) Angelos Katharopoulos 2025-07-14 13:11:16 -0700
  • e146ee829a Add Primitive::name and remove Primitive::print Cheng 2025-07-13 09:46:46 +0900
  • 7a875358a0 Fix the test Angelos Katharopoulos 2025-07-14 00:14:13 -0700
  • c92dd818ea Fix imag() vjp Angelos Katharopoulos 2025-07-13 23:53:09 -0700
  • 07537e6040 decouple python bindings from core libraries Awni Hannun 2025-07-12 14:10:13 -0700
  • c288e548f7 Stabilize Newton-Schulz convergence + tooling Mason James 2025-07-12 20:54:58 -0400
  • ffbeacf974 Implement cubic Newton-Schulz method with fallback Mason James 2025-07-12 20:27:15 -0400
  • 49a304e362 Fix dtype promotion & state dict test logic Mason James 2025-07-12 20:07:38 -0400
  • 4b0bc46832 Fix transpose bug Mason James 2025-07-12 19:54:07 -0400
  • b93564eb5d Add Muon optimizer implementation to MLX Mason James 2025-07-12 18:59:10 -0400
  • 2d3c26c565
    [CUDA] Do not put kernels in annoymous namespace (#2362) Cheng 2025-07-13 06:24:45 +0900
  • af05c106c8 update circle Awni Hannun 2025-07-11 08:32:16 -0700
  • a7da07c4f0 Only call get_primitive_string on error Cheng 2025-07-12 07:17:08 +0000
  • 6325f60d52
    [CUDA] Bundle CCCL for JIT compilation (#2357) Cheng 2025-07-12 10:45:37 +0900
  • c830b398e0 [CUDA] Do not put kernels in annoymous namespace Cheng 2025-07-12 00:49:22 +0000
  • a9c720e8cd Improve the ring backend initialization ring-init Angelos Katharopoulos 2025-07-11 15:31:28 -0700
  • 42cc9cfbc7
    fix copy dispatch (#2360) Awni Hannun 2025-07-11 10:59:35 -0700
  • 38c9085938 update circle Awni Hannun 2025-07-11 08:32:16 -0700
  • 8f93ca9e52 cleanup circle, fix cuda repair Awni Hannun 2025-07-11 08:17:58 -0700
  • dea8324f59 fix copy dispatch Awni Hannun 2025-07-11 06:57:19 -0700
  • c623cc7683 temp for testing Awni Hannun 2025-07-10 19:36:36 -0700
  • 9381163788 install linux with mlx[cuda] and mlx[cpu] Awni Hannun 2025-07-10 17:18:14 -0700
  • 15390f80a3 Remove cexpf Cheng 2025-07-11 04:46:09 +0000
  • c55e0fb083 Ship CCCL for JIT compilation Cheng 2025-07-11 03:41:51 +0000
  • 8347575ba1
    [CUDA] Implement Scan kernel (#2347) Cheng 2025-07-11 08:54:12 +0900
  • b6eec20260
    Fix edge check in qmm_n QuantizedLoader (#2355) Angelos Katharopoulos 2025-07-10 16:28:50 -0700
  • 1629b3b4e6 Fix edge check in qmm_n QuantizedLoader Angelos Katharopoulos 2025-07-10 15:43:05 -0700
  • e9a2190a04 Use cexpf in Metal Cheng 2025-07-11 07:37:59 +0900
  • 3564913327 Fix failing logaddexp test Cheng 2025-07-10 02:13:24 +0000
  • f797b1b3e5 Enable tests Cheng 2025-07-09 11:15:10 +0000
  • b89d8ef1c0 Strided scan Cheng 2025-07-09 10:42:05 +0000
  • e769fcca60 Contiguous scan Cheng 2025-07-08 23:21:07 +0000
  • 0eb035b4b1
    Fix type promotion in Adam with bias correction (#2350) Angelos Katharopoulos 2025-07-10 11:14:42 -0700
  • afb9817599
    [CUDA] Put version in ptx cache dir path (#2352) Cheng 2025-07-10 23:24:21 +0900
  • 8fb3e7a26c
    [CUDA] Set current device before cudaGraphLaunch (#2351) Cheng 2025-07-10 23:24:02 +0900
  • 8c7bc30ce4
    Align mlx::core::min op nan propagation with NumPy (#2346) jhavukainen 2025-07-10 06:20:43 -0700
  • bf7236ea42 [CUDA] Put version in ptx cache dir path Cheng 2025-07-10 10:27:45 +0000
  • 26abcff181 [CUDA] Set current device before cudaGraphLaunch Cheng 2025-07-03 08:04:34 +0000
  • 85873cb162
    [CUDA] Do vectorized store/load in contiguous elementwise ops (#2342) Cheng 2025-07-10 10:48:43 +0900
  • 067950ce00 Fix type promotion in Adam w bias correction Angelos Katharopoulos 2025-07-09 18:08:36 -0700
  • 61003524ee Align mlx::core::min op nan propagation with NumPy Joona Havukainen 2025-07-09 16:27:10 -0700
  • e3534c2db8 Contig uses uint as index and non-contig uses int Cheng 2025-07-09 23:05:21 +0000
  • e14ee12491
    add zero for argsort vjp (#2345) Awni Hannun 2025-07-09 14:37:14 -0700
  • 970e991c81 add zero for argsort vjp Awni Hannun 2025-07-09 11:54:35 -0700
  • 8b9a3f3cea
    Align mlx::core::max op nan propagation with NumPy (#2339) jhavukainen 2025-07-09 11:26:27 -0700
  • 5c932c7bb0 Use uint as index type Cheng 2025-07-09 01:00:13 +0000
  • 9a742090ae Remove tuple unpacking syntax to comply with earlier python versions. Add cuda skip to nanpropagation tests, fix cuda implementation in a separate PR. Joona Havukainen 2025-07-08 20:52:09 -0700
  • 5c3663d4a7 Fix tests on large arrays Cheng 2025-07-08 01:25:05 +0000
  • e66b685a08 binary => binary_two in binary_two.cu Cheng 2025-07-08 00:56:11 +0000
  • 70ade3015f Use int32_t for IdxT Cheng 2025-07-08 00:43:39 +0000
  • 5459b54bcd Do vectorized store/load in ternary ops Cheng 2025-07-08 00:36:07 +0000
  • 3eb59aab6e Do vectorized store/load in copy ops Cheng 2025-07-08 00:22:12 +0000
  • bbff91f920 Do vectorized store/load in binary_two ops Cheng 2025-07-07 23:53:42 +0000
  • 5962fa66bc Do vectorized store/load in unary ops Cheng 2025-07-07 23:34:26 +0000
  • aca7fac9ef Make the max nanpropagation test more meaningful for integer types Joona Havukainen 2025-07-08 16:42:19 -0700
  • 8b15773206 Add cpu Max nanpropagation. Fix a small fib in cpu max dispatch data types for int8/int16. Joona Havukainen 2025-07-08 16:41:56 -0700
  • fb4e8b896b
    patch bump (#2343) v0.26.3 Awni Hannun 2025-07-08 14:26:07 -0700
  • 5c17d2134f patch bump Awni Hannun 2025-07-08 13:53:59 -0700
  • 2ca533b279
    Fix compilation with CUDA 11 (#2331) Cheng 2025-07-08 12:00:43 +0900
  • 3e885f583a Cleanup using namespace alias Joona Havukainen 2025-07-07 18:25:57 -0700
  • c7af3016eb Only check nans on non-integral types in simd_reduce_impl. Joona Havukainen 2025-07-07 18:24:30 -0700
  • 4a9b29a875
    MoE backward improvements (#2335) Angelos Katharopoulos 2025-07-07 17:59:53 -0700
  • 3336a35512 Fix the segments type in the test Angelos Katharopoulos 2025-07-07 17:25:19 -0700
  • bbdc34a0cc Fix compilation with CUDA 11 Cheng 2025-07-04 07:00:22 +0000
  • 1c589298ec Address comments Angelos Katharopoulos 2025-07-07 17:03:28 -0700
  • a4fcc893cd
    auto build linux release (#2341) Awni Hannun 2025-07-07 09:29:23 -0700
  • 4af09362cc auto build linux release Awni Hannun 2025-07-07 06:55:42 -0700
  • 9d10239af7
    [CUDA] Do vectorized store/load in binary ops (#2330) Cheng 2025-07-08 00:44:14 +0900
  • 19facd4b20
    Build with all cpu cores by default (#2336) Cheng 2025-07-07 22:06:45 +0900
  • f5299f72cd
    Fix layernorm race condition (#2340) Angelos Katharopoulos 2025-07-07 06:06:01 -0700
  • d5cd9aa8f4 Fix layernorm race condition Angelos Katharopoulos 2025-07-07 02:45:45 -0700
  • 8ea5729ee4 CI weirdness due to large arrays Angelos Katharopoulos 2025-07-07 00:18:42 -0700
  • b35b81ae94 Build with all cpu cores by default Cheng 2025-07-06 19:51:47 +0900