Commit Graph

  • f46877bc08 more accurate rope fallback (#2792) Awni Hannun 2025-11-19 06:07:21 -08:00
  • 6f35017d1b [CUDA] cuDNN backward attention (#2762) Cheng 2025-11-19 08:13:50 +09:00
  • b167f0df1c build docs on linux (#2787) Awni Hannun 2025-11-18 08:01:03 -08:00
  • a9f0d6b160 Avoid duplicate CI runs when starting a PR from upstream branch (#2788) Cheng 2025-11-18 15:16:25 +09:00
  • 940f4c7818 Fix building with CUDA < 12.8 (#2782) Cheng 2025-11-18 12:55:19 +09:00
  • 35f81728f1 Remove unneeded tests in nightly build (#2786) Cheng 2025-11-18 08:09:58 +09:00
  • 4442ed86c1 Fix nightly build (#2785) Cheng 2025-11-18 08:07:51 +09:00
  • 698559c231 Test every commit in main branch (#2781) Cheng 2025-11-18 08:07:22 +09:00
  • ecc4879b07 Do not run CPU tests in CUDA builds (#2784) Cheng 2025-11-18 07:27:09 +09:00
  • 32b18d8b66 Use std::optional for mask_arr arg (#2763) Cheng 2025-11-17 10:43:33 +09:00
  • 472c43a0c8 Build and test with multiple CUDA versions (#2780) Cheng 2025-11-17 09:19:02 +09:00
  • b7214ff01e Remove pip cache in GitHub Actions (#2776) Cheng 2025-11-17 08:19:59 +09:00
  • 76414c8971 Run CI for pushes (#2777) Cheng 2025-11-17 07:19:01 +09:00
  • 49e4566df3 fix release 2 (#2767) Awni Hannun 2025-11-16 11:39:53 -08:00
  • aad49f932f [CUDA] Tune ops per buffer based on device (#2761) Awni Hannun 2025-11-16 06:29:49 -08:00
  • 86765cce34 Use ccache in GitHub Actions (#2773) Cheng 2025-11-16 07:58:14 +09:00
  • 1bedcbd556 Fix warnings with cmake 4.1 (#2774) Cheng 2025-11-16 07:12:47 +09:00
  • 9ac7dbe877 Fix MPI distributed tests with CUDA backend (#2775) Cheng 2025-11-16 07:12:18 +09:00
  • 1bf605d56d use arch specific targets when possible (#2771) Awni Hannun 2025-11-14 20:04:18 -08:00
  • 3c622ddd1d Separate test-linux from build-linux/cuda in GitHub Actions (#2765) Cheng 2025-11-15 11:14:09 +09:00
  • 27ff069175 Fix exporting with constants (#2769) Awni Hannun 2025-11-14 12:52:08 -08:00
  • 3b2ffcefc3 [CUDA] cuDNN forward attention (#2743) Cheng 2025-11-14 09:23:56 +09:00
  • b65f882df3 fix release (#2759) Awni Hannun 2025-11-13 15:34:01 -08:00
  • b704e9e77a [CUDA] Check CUDA error in synchronize (#2757) Cheng 2025-11-14 07:10:23 +09:00
  • 66519fb348 fix slice (#2758) Awni Hannun 2025-11-13 11:30:02 -08:00
  • 8973550ff3 export custom kernel (#2756) Awni Hannun 2025-11-13 11:29:50 -08:00
  • 3f866be665 minor debugging for publishing (#2739) Mike Drob 2025-11-12 08:33:39 -06:00
  • 23f81ed1c1 Linux on arm (#2751) Awni Hannun 2025-11-11 11:41:14 -08:00
  • 3fe2250c00 Fix irregular_strides benchmark shape type (#2754) wrmsr 2025-11-11 11:40:22 -08:00
  • 047114b988 remove circle (#2753) Awni Hannun 2025-11-11 11:39:47 -08:00
  • 9320eb89a8 Fix dequantize python sig (dtype default) (#2752) wrmsr 2025-11-11 09:55:24 -08:00
  • 75819d70ea patch bump (#2750) Awni Hannun 2025-11-11 08:49:14 -08:00
  • 60d80a3728 fix release builds (#2746) v0.29.4 Awni Hannun 2025-11-11 07:44:30 -08:00
  • eba6a9d163 Compatibility with pip-installed openmpi (#2741) Pedro Cuenca 2025-11-08 01:58:31 +01:00
  • be9e2aebd6 Shapeless support for zeros/ones_like (#2726) CCYeh 2025-11-07 04:12:20 +01:00
  • df58b4133a [CUDA] Reduce use of managed memory (#2725) Awni Hannun 2025-11-05 16:05:23 -08:00
  • 27778156dc Nccl reduce scatter, all gather (#2727) Anastasiia Filippova 2025-11-05 17:21:11 +01:00
  • 761f901a41 fix property name (#2736) Mike Drob 2025-11-05 06:31:56 -06:00
  • 6ece97f69b Make cpu binary_op easily accessible (#2733) Angelos Katharopoulos 2025-11-05 01:08:41 -08:00
  • d3bc6a9bff don't test when doing release (#2734) Awni Hannun 2025-11-04 15:54:23 -08:00
  • 26ceb507eb only build for macos 14 and up (#2731) Awni Hannun 2025-11-04 09:44:15 -08:00
  • 910b3e3299 skip self-hosted runners on forks (#2730) Mike Drob 2025-11-03 23:22:13 +01:00
  • 50fa315d18 Fix addmm with empty matrices and beta != 1.0 (#2715) Harsh Sutaria 2025-11-03 17:16:15 -05:00
  • 1ff2b713b6 Check isnan in maximum / minimum with CPU backend (#2652) AN Long 2025-11-04 01:51:14 +09:00
  • 50514a6146 Set up publishing to PyPI and Test-PyPI (#2721) Mike Drob 2025-11-03 16:20:11 +01:00
  • 93d76b0f30 Fix compile multi capture (#2678) Awni Hannun 2025-11-03 06:33:43 -08:00
  • 78678de0cd add null check -- the bundleIdentifier is optional (#2709) David Koski 2025-11-03 06:33:21 -08:00
  • ed9c6b1117 update: add linux fedora container CI - CPP build test only (#2722) Melissa Kilby 2025-11-03 06:33:00 -08:00
  • 24828b1b2f CMakeLists.txt update sign-warns Ronan Collobert 2025-10-31 16:55:04 -07:00
  • 9f649b5658 WIP (python) Ronan Collobert 2025-10-31 16:24:51 -07:00
  • 18aa921388 WIP Ronan Collobert 2025-10-31 16:24:35 -07:00
  • 8d13a0bc6b WIP (metal) Ronan Collobert 2025-10-31 16:24:21 -07:00
  • ac75c87fd7 WIP (cpu) Ronan Collobert 2025-10-31 16:24:09 -07:00
  • 7107802e09 WIP (examples) Ronan Collobert 2025-10-31 16:23:51 -07:00
  • c5913131cf WIP (distributed) Ronan Collobert 2025-10-31 13:32:56 -07:00
  • 19ab7911f6 WIP (cuda) Ronan Collobert 2025-10-31 13:32:43 -07:00
  • 4a1b1796b7 WIP (io) Ronan Collobert 2025-10-31 13:20:47 -07:00
  • b48d298205 WIP (distributed) Ronan Collobert 2025-10-31 13:20:09 -07:00
  • 8277e71ea9 WIP (gpu) Ronan Collobert 2025-10-31 13:19:54 -07:00
  • b0d985416a fix arg_reduce Ronan Collobert 2025-10-31 13:13:15 -07:00
  • 39b04ce638 use faster dequant for fp4 qmv (#2720) Awni Hannun 2025-10-31 11:49:59 -07:00
  • 8d10f3ec75 WIP (metal) Ronan Collobert 2025-10-31 11:47:03 -07:00
  • 6343622c67 fix small vector indexing checks Ronan Collobert 2025-10-31 11:46:36 -07:00
  • 979abf462b WIP (metal) Ronan Collobert 2025-10-31 09:43:29 -07:00
  • 981d2fdaf0 WIP (cpu) Ronan Collobert 2025-10-31 09:40:50 -07:00
  • 5a306d3495 WIP (common) Ronan Collobert 2025-10-31 09:40:13 -07:00
  • 5baa361779 WIP (tests) Ronan Collobert 2025-10-31 09:39:38 -07:00
  • d9e6349657 fix docs path (#2719) Mike Drob 2025-10-31 01:12:49 +01:00
  • 1bac0db7e3 WIP Ronan Collobert 2025-10-30 16:25:36 -07:00
  • a1212b4e44 WIP (distributed) Ronan Collobert 2025-10-30 16:25:11 -07:00
  • 45a8b226af WIP (cpu) Ronan Collobert 2025-10-30 16:24:51 -07:00
  • 76ef1e98f3 WIP (common) Ronan Collobert 2025-10-30 16:18:59 -07:00
  • b901a9f311 Fix the order of hosts in the ring (#2718) Angelos Katharopoulos 2025-10-30 15:02:39 -07:00
  • 68c5fa1c95 fix memory count bug (#2717) Awni Hannun 2025-10-30 14:27:15 -07:00
  • 793a31eeb6 Fix missing domain_uuid_key in thunderbolt ring setup (#2682) Christopher Webb 2025-10-30 15:17:20 -05:00
  • 74c1ed25bb Migrate CircleCI to GitHub Actions (#2716) Mike Drob 2025-10-30 18:26:55 +01:00
  • 63d91557e0 fix FFT (PocketFFT requires size_t for axis) Ronan Collobert 2025-10-29 17:05:48 -07:00
  • 310e501e6a WIP (cpu) Ronan Collobert 2025-10-29 16:52:25 -07:00
  • cacc3ab7fd WIP (common) Ronan Collobert 2025-10-29 16:51:42 -07:00
  • 53525cba23 WIP Ronan Collobert 2025-10-29 16:51:05 -07:00
  • 3d67b717a0 the cpu simd case Ronan Collobert 2025-10-29 16:43:18 -07:00
  • 953b2f5be2 WIP Ronan Collobert 2025-10-29 16:11:32 -07:00
  • 26f7155537 SmallVector: keep sizes small (int) Ronan Collobert 2025-10-29 16:06:10 -07:00
  • 66fcb9fe94 array: use int or int64_t instead of size_t Ronan Collobert 2025-10-29 16:02:31 -07:00
  • ec72b44417 Add quantize/dequantize for mxfp8 and nvfp4 (#2688) Awni Hannun 2025-10-28 16:23:12 -07:00
  • 460691a0e8 fix: linux-{fedora}x86_64-build (#2707) Melissa Kilby 2025-10-27 23:36:08 +00:00
  • 969924cc69 Fp8 conversion (#2686) Awni Hannun 2025-10-27 16:35:50 -07:00
  • d1e06117e8 bump python (#2694) Awni Hannun 2025-10-27 11:34:31 -07:00
  • 539d8322d1 add median op (#2705) Awni Hannun 2025-10-27 11:33:42 -07:00
  • c4767d110f fix addmm cpu (#2699) Awni Hannun 2025-10-27 11:33:32 -07:00
  • 895217f25b optionally load metallib from framework (#2702) David Koski 2025-10-27 07:52:03 -07:00
  • 0cfeeb60ca Einsum error msg improvement (#2690) Manuel Villanueva 2025-10-27 08:31:47 -05:00
  • 8f8af61a37 fix warnings showing up with -Wall (#2692) Ronan Collobert 2025-10-24 11:43:35 -07:00
  • 233384161e Improved mx.split() docs (#2689) Manuel Villanueva 2025-10-24 11:48:41 -05:00
  • 5bcf3a6794 format Awni Hannun 2025-10-22 16:08:47 -07:00
  • 7707196297 Merge commit from fork wickedcoder 2025-10-23 01:31:25 +03:00
  • 7e3471c987 Merge commit from fork wickedcoder 2025-10-23 01:31:03 +03:00
  • 7cfd0da856 rebase gh-pages CircleCI Docs 2025-10-17 19:16:27 +00:00
  • e492c1dcd9 rebase CircleCI Docs 2025-09-26 22:21:25 +00:00
  • d96a372a7d rebase CircleCI Docs 2025-09-12 00:17:05 +00:00