Commit Graph

1317 Commits

Author SHA1 Message Date
Awni Hannun
20469ce2d1 docs update 2025-07-25 22:53:09 +00:00
Awni Hannun
93cf6e9f5c docs update 2025-07-25 22:53:09 +00:00
Awni Hannun
61c03b86a4 docs update 2025-07-25 22:53:09 +00:00
Awni Hannun
ba26fe5be5 docs update 2025-07-25 22:53:09 +00:00
Awni Hannun
8b5550f9f8 docs update 2025-07-25 22:53:09 +00:00
Awni Hannun
e1f3a7b14d use proper version 2025-07-25 22:53:08 +00:00
Awni Hannun
eede93197d docs update 2025-07-25 22:53:08 +00:00
Awni Hannun
fd78c54288 docs update 2025-07-25 22:53:08 +00:00
Awni Hannun
fdf4088123 docs update 2025-07-25 22:53:08 +00:00
Awni Hannun
9c44222630 docs update 2025-07-25 22:53:08 +00:00
Awni Hannun
f65f98fc82 remove uneeded files in docs 2025-07-25 22:53:08 +00:00
Awni Hannun
21cae9cb8f update docs 2025-07-25 22:53:08 +00:00
Awni Hannun
30ea2df988 docs update 2025-07-25 22:53:08 +00:00
Awni Hannun
d9d0777c2e docs up 2025-07-25 22:53:08 +00:00
Awni Hannun
d67cd9230c docs up 2025-07-25 22:53:07 +00:00
Awni Hannun
d03b91923e docs update 2025-07-25 22:53:07 +00:00
Awni Hannun
8bea0a4eb8 docs 2025-07-25 22:53:07 +00:00
Awni Hannun
0250e203f6 docs 2025-07-25 22:53:07 +00:00
Awni Hannun
f75712551d update docs 2025-07-25 22:53:07 +00:00
Awni Hannun
af2c3689fe docs 2025-07-25 22:53:07 +00:00
Awni Hannun
5ac2eec7b3 docs 2025-07-25 22:53:07 +00:00
Awni Hannun
f89de9c25d docs 2025-07-25 22:53:07 +00:00
Awni Hannun
9ad2650c9d docs 2025-07-25 22:53:07 +00:00
Awni Hannun
ea288788f8 docs 2025-07-25 22:53:07 +00:00
Awni Hannun
efe5c824af docs 2025-07-25 22:53:07 +00:00
Awni Hannun
e6ffce1a9b docs 2025-07-25 22:53:07 +00:00
Awni Hannun
a847d1dbd0 docs 2025-07-25 22:53:07 +00:00
Awni Hannun
dca1d17eb9 docs 2025-07-25 22:53:07 +00:00
Awni Hannun
4ad53414dd
fix cuda pypi package (#2423)
* fix cuda pypi package

* patch bump
2025-07-25 15:20:29 -07:00
Awni Hannun
d1165b215e
version (#2420) 2025-07-25 13:29:28 -07:00
Awni Hannun
dcb8319f3d
update install docs and requirements (#2419) 2025-07-25 12:13:19 -07:00
Awni Hannun
5597fa089c
Fix qvm splitk (#2415) 2025-07-25 11:50:24 -07:00
Awni Hannun
9acec364c2
[CUDA] Always use batched matmul (#2404)
* cuda batched mm

* addmm as well

* comment
2025-07-24 20:46:02 -07:00
Skonor
7d9d6ef456
docs: fix adam and adamw eps placement (#2416)
Co-authored-by: Mikhail Gorbunov <m_gorbunov@apple.com>
2025-07-24 16:40:45 -07:00
Cheng
6f5874a2f2
[CUDA] Initial implementation of Convolution with cuDNN (#2385)
* Link with cuDNN

* Initial implementation

* Remove backend apis

* Fix recording cudnn conv

* More unused backend apis

* Fix C++ conv tests

* include cudnn as python dep

* Install libcudnn9-dev-cuda-12 in CI

* cudnn only accepts contiguous inputs

* Switch to backend apis

* Plan needs to be kept alive

* Turn off tf32

* Add cache

* Test the native cuda graph api

* Set cudnn stream before execution

* Make LRUCache more like a normal container

* Do error check for cublas handle

* Zero-initilizing array

* Use tf32 for conv

* Skip TestConv.test_torch_conv_2D test

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2025-07-25 08:12:10 +09:00
Awni Hannun
70dc336785
Test on cuda 12.2 and 12.9 (#2413) 2025-07-24 06:06:15 -07:00
Awni Hannun
4e504039f5
[Metal] Release metal events (#2412)
* release metal events

* fix

* fix
2025-07-23 19:53:42 -07:00
Awni Hannun
d1f4d291e8
Fix uv install and add dev release (#2411)
* fix uv install and add dev release

* fix docstring

* pin cuda deps

* cuda release on cpu-only machine
2025-07-23 16:54:19 -07:00
Awni Hannun
e1840853ce
full row mask in sdpa consistently gives nan (#2406) 2025-07-23 16:37:03 -07:00
Cheng
0f5ce173da
[CUDA] --compress-mode requires CUDA 12.8 (#2407) 2025-07-23 06:11:11 -07:00
Cheng
588854195f
Remove unused code in Convolution::vjp (#2408) 2025-07-23 06:11:00 -07:00
Fangjun Kuang
28d068bce6
Fix an error in the comment for mx.dequantize (#2409) 2025-07-23 06:10:50 -07:00
Awni Hannun
d107d8d495
add cuda gemv (#2400) 2025-07-22 08:24:13 -07:00
Awni Hannun
1e496ddb82
[CUDA] Simplify allocator (#2392)
* simplify allocator and fixe race with small pool

* Don't use shared event in worker

* use cuda buffer in small pool

* comment

* comment
2025-07-22 08:24:01 -07:00
Awni Hannun
74eccbf3fa
use size option in binary (#2399) 2025-07-22 07:00:53 -07:00
Awni Hannun
08638223ca
Fix including stubs in wheel (#2398)
* fix including stubs in wheel

* fix bool_
2025-07-22 06:30:17 -07:00
Cheng
56cc858af9
Add contiguous_copy_cpu util for copying array (#2397) 2025-07-21 07:30:35 -07:00
Cheng
f55c4ed1d6
Remove thrust iterators (#2396) 2025-07-21 07:30:27 -07:00
Awni Hannun
93d70419e7
[CUDA] speedup handling scalars (#2389)
* speedup scalars in cuda

* comment
2025-07-18 21:47:31 -07:00
Awni Hannun
63f663d9c6
fix cuda manylinux version to match others (#2388) 2025-07-18 21:02:16 -07:00