Commit Graph

  • 0c69f10d55 Refactor gemv into a function Jagrit Digani 2025-06-10 16:53:23 -0700
  • 4a01d2d5d3 fix complex power and print Awni Hannun 2025-06-13 07:22:03 -0700
  • b7a9754872 feat: Add convergence checking and algorithm improvements Arkar Min Aung 2025-06-14 00:16:23 +1000
  • 3d8c7583f2 feat: Implement basic one-sided Jacobi SVD algorithm in Metal Arkar Min Aung 2025-06-13 23:34:36 +1000
  • a71a9e0ddd feat: Add Metal SVD infrastructure and parameter structures Arkar Min Aung 2025-06-13 23:28:52 +1000
  • c8b4787e4e
    CUDA backend: indexing ops (#2277) Cheng 2025-06-13 13:44:19 +0900
  • c8f79d38ec CUDA backend: indexing ops Cheng 2025-05-21 02:15:09 +0000
  • 2188199ff8
    [CUDA] ternary with select op (#2283) Awni Hannun 2025-06-12 20:24:43 -0700
  • f07eb684a6 fix Awni Hannun 2025-06-12 20:24:23 -0700
  • 850ad01914 comment + fix Awni Hannun 2025-06-12 16:34:44 -0700
  • 4d95cb24b4 cuda ternary with select op Awni Hannun 2025-06-12 16:25:20 -0700
  • aa07429bad
    Fix cuda build (#2284) Awni Hannun 2025-06-12 17:48:05 -0700
  • 15474f305a fix cuda build Awni Hannun 2025-06-12 17:19:04 -0700
  • 918761a25a
    [CUDA] RMSNorm and VJP (#2280) Awni Hannun 2025-06-12 17:09:49 -0700
  • a4fc671d3e
    CUDA backend: compile (#2276) Cheng 2025-06-13 09:08:39 +0900
  • ef9495fb8f Rename kernels/ to device/ Cheng 2025-06-12 23:39:06 +0000
  • f5f65ef48c
    Make sliceUpdate general (#2282) Awni Hannun 2025-06-12 16:48:54 -0700
  • 50dcaa6a7c nit Awni Hannun 2025-06-12 16:28:02 -0700
  • 906c6f4fd0 fix Awni Hannun 2025-06-12 14:40:12 -0700
  • 9825c33b90 Make sliceUpdate general Awni Hannun 2025-06-12 14:20:33 -0700
  • bd2ea38397 rms norm start Awni Hannun 2025-06-11 22:16:02 -0700
  • c2dd81a8aa
    Fix warnings from latest CUDA toolkit (#2275) Cheng 2025-06-12 22:03:01 +0900
  • b2dd60c1dd CUDA backend: compile Cheng 2025-05-30 07:47:32 +0000
  • 27e9540ebe Fix warnings from latest CUDA toolkit Cheng 2025-06-11 23:50:03 +0000
  • d7e680ffe4
    CUDA backend: layernorm (#2271) Cheng 2025-06-12 07:48:32 +0900
  • ebdd22a8d4 CUDA backend: layernorm Cheng 2025-05-06 11:26:42 +0000
  • c371baf53a
    CUDA backend: softmax (#2272) Cheng 2025-06-12 05:55:22 +0900
  • ccf78f566c
    CUDA backend: argreduce (#2270) Cheng 2025-06-12 05:26:17 +0900
  • f8824f0ce1 CUDA backend: softmax Cheng 2025-05-07 06:02:17 +0000
  • 31f70fe93c CUDA backend: argreduce Cheng 2025-04-18 00:37:30 +0000
  • c9fa68664a
    CUDA backend: reduce (#2269) Cheng 2025-06-12 03:22:25 +0900
  • c02b44a637 CUDA backend: reduce Cheng 2025-04-17 09:13:00 +0000
  • c35f4d089a
    start cuda circle config (#2256) Awni Hannun 2025-06-10 21:19:47 -0700
  • 8590c0941e
    Add load_safe to the general conv loaders (#2258) Angelos Katharopoulos 2025-06-10 20:58:16 -0700
  • 095163b8d1
    Fix building cpp benchmarks on Linux (#2268) Cheng 2025-06-11 09:10:24 +0900
  • 094db40fdf Fix building cpp benchmarks on Linux Cheng 2025-06-10 22:20:50 +0000
  • 26e93e9905 start cuda circle config Awni Hannun 2025-06-08 17:26:00 -0700
  • c830b5a9f9 fix metal kernel linking issue on cuda cuda_available Awni Hannun 2025-06-08 17:33:49 -0700
  • 283a136c64 rebase Awni Hannun 2025-06-10 10:54:53 -0700
  • 99c33d011d
    rebase + nit (#2260) Cheng 2025-06-11 02:51:51 +0900
  • 191ae31130 rebase + nit Awni Hannun 2025-06-10 09:43:40 -0700
  • 62fecf3e13
    fix conv export (#2265) Awni Hannun 2025-06-10 09:34:01 -0700
  • 7c4eb5d03e
    CUDA backend: random (#2261) Cheng 2025-06-11 00:59:56 +0900
  • bae9a6b404
    CUDA backend: sort (#2262) Cheng 2025-06-11 00:59:47 +0900
  • c08db275c5 fix conv export Awni Hannun 2025-06-10 08:58:10 -0700
  • 5ce655a646
    Merge branch 'main' into cuda-sort Awni Hannun 2025-06-10 08:17:53 -0700
  • 004c1d8ef2
    Report number of missing parameters (#2264) Christopher Fleetwood 2025-06-10 14:37:50 +0100
  • 7ebb2e0193
    CUDA backend: binary ops (#2259) Cheng 2025-06-10 22:37:40 +0900
  • 785bc85966
    chore: format FL33TW00D 2025-06-10 13:25:33 +0100
  • 43708f3d97
    chore: inform FL33TW00D 2025-06-10 13:22:52 +0100
  • 9ce77798b1
    fix export to work with gather/scatter axis (#2263) Awni Hannun 2025-06-09 20:37:27 -0700
  • e53a7b6a73 fix export to work with gather/scatter axis Awni Hannun 2025-06-09 20:14:24 -0700
  • 35401c22db CUDA backend: sort Cheng 2025-04-22 07:33:13 +0000
  • 1761b4dace CUDA backend: random Cheng 2025-04-19 03:31:09 +0000
  • 21c4a92ec1 CUDA backend: binary ops Cheng 2025-04-13 23:51:11 +0000
  • f8bad60609
    CUDA backend: unary ops (#2158) Cheng 2025-06-09 22:45:08 +0900
  • fb6e04867e Remove iostream Angelos Katharopoulos 2025-06-09 04:13:15 -0700
  • 6cb945dfc1 Add a small benchmark Angelos Katharopoulos 2025-06-09 03:52:36 -0700
  • c9af09d118 Add load_safe to conv general loader Angelos Katharopoulos 2025-06-03 10:13:16 -0700
  • 5866b3857b
    Refactor the lu test (#2250) Emmanuel Ferdman 2025-06-07 16:12:08 +0300
  • 9650567a13
    Refactor the lu test Emmanuel Ferdman 2025-06-07 04:56:35 -0700
  • 1ca616844b
    Fix unintuitive metal kernel caching (#2242) Awni Hannun 2025-06-06 20:08:15 -0700
  • 4c43ff0591 CUDA backend: unary ops Cheng 2025-04-13 10:53:15 +0000
  • 2e8cf0b450
    Change layernorms to two pass algorithm (#2246) Angelos Katharopoulos 2025-06-06 13:34:56 -0700
  • 24f89173d1
    CUDA backend: matmul (#2241) Cheng 2025-06-07 04:24:04 +0900
  • 6741d15735 alternative solution Awni Hannun 2025-06-05 20:31:05 -0700
  • 97bd67c032 Check for valid launch parameters Angelos Katharopoulos 2025-06-06 11:41:56 -0700
  • ac1117b224 Fix unintuitive metal kernel caching Awni Hannun 2025-06-03 08:01:53 -0700
  • c6a20b427a
    Improve metal elementwise kernels (#2247) Awni Hannun 2025-06-06 11:37:40 -0700
  • a4a4b46b8d fix jit Awni Hannun 2025-06-06 11:08:22 -0700
  • 570dd8287a Fix formatting Angelos Katharopoulos 2025-06-06 01:09:31 -0700
  • 7734bc5c4f Change layernorms to two pass algorithm Angelos Katharopoulos 2025-06-03 15:35:20 -0700
  • ba8748b12e compile and copy Awni Hannun 2025-06-06 07:53:52 -0700
  • d0ebd18d7d improve metal elementwise kernels Awni Hannun 2025-06-06 07:01:03 -0700
  • a5ac9244c4
    fix linux linking error (#2248) Awni Hannun 2025-06-06 10:41:51 -0700
  • 6812a0d0cd fix linux linking error Awni Hannun 2025-06-06 09:26:06 -0700
  • c763fe1be0
    default strict mode for module update and update_modules (#2239) Awni Hannun 2025-06-05 15:27:02 -0700
  • 52dc8c8cd5
    Add profiler annotations in common primitives for CUDA backend (#2244) Cheng 2025-06-05 11:55:12 +0900
  • e9a4d281d6 Add profiler annotations in common primitives for CUDA backend Cheng 2025-05-27 07:18:53 +0000
  • 0b38729f41 rebase gh-pages CircleCI Docs 2025-06-04 01:03:47 +0000
  • 4d69ffa12a rebase CircleCI Docs 2025-06-02 23:29:32 +0000
  • 5b083deb3d rebase CircleCI Docs 2025-05-09 21:42:00 +0000
  • f5dfe54504 rebase CircleCI Docs 2025-04-24 23:16:38 +0000
  • 8b7a2f2f80 rebase CircleCI Docs 2025-04-17 22:29:33 +0000
  • 2b0b1bae00 rebase CircleCI Docs 2025-04-03 20:25:24 +0000
  • 461fc93fa6 rebase CircleCI Docs 2025-03-24 20:24:41 +0000
  • a08f30df65 rebase CircleCI Docs 2025-03-20 22:37:22 +0000
  • b48fb5d7dc rebase CircleCI Docs 2025-03-05 21:30:09 +0000
  • e1c8a49c45 rebase CircleCI Docs 2025-02-18 22:45:08 +0000
  • 2992e564ea rebase CircleCI Docs 2025-02-14 21:44:39 +0000
  • c7080f89ca rebase CircleCI Docs 2025-02-06 20:16:29 +0000
  • 1431bea8cb rebase CircleCI Docs 2025-01-09 21:56:20 +0000
  • 171ca75e2a rebase CircleCI Docs 2024-12-06 21:22:39 +0000
  • 347febaf97 rebase CircleCI Docs 2024-11-22 20:24:16 +0000
  • a950c2e684 rebase CircleCI Docs 2024-11-05 20:44:07 +0000
  • e5e2ffe503 rebase CircleCI Docs 2024-11-05 19:54:16 +0000
  • a5d741ec3b rebase CircleCI Docs 2024-10-31 23:17:05 +0000
  • c3756327b1 rebase CircleCI Docs 2024-10-31 03:00:19 +0000
  • c996fc9d45 rebase CircleCI Docs 2024-10-25 20:23:45 +0000
  • 7779cac836 rebase CircleCI Docs 2024-10-18 19:13:44 +0000