mlx/mlx at 1ae2f817bfe15f6301e49256c60ae267223fcaff - mlx - Gitea for Geophysics

zhangyiss/mlx

mirror of https://github.com/ml-explore/mlx.git synced 2025-12-16 01:49:05 +08:00

Files

History

Cheng 6f5874a2f2 [CUDA] Initial implementation of Convolution with cuDNN (#2385 )

* Link with cuDNN

* Initial implementation

* Remove backend apis

* Fix recording cudnn conv

* More unused backend apis

* Fix C++ conv tests

* include cudnn as python dep

* Install libcudnn9-dev-cuda-12 in CI

* cudnn only accepts contiguous inputs

* Switch to backend apis

* Plan needs to be kept alive

* Turn off tf32

* Add cache

* Test the native cuda graph api

* Set cudnn stream before execution

* Make LRUCache more like a normal container

* Do error check for cublas handle

* Zero-initilizing array

* Use tf32 for conv

* Skip TestConv.test_torch_conv_2D test

---------

Co-authored-by: Awni Hannun <awni@apple.com>

2025-07-25 08:12:10 +09:00

..

jagrit's commit files

2023-11-29 10:52:08 -08:00

[CUDA] Initial implementation of Convolution with cuDNN (#2385 )

2025-07-25 08:12:10 +09:00

Add Primitive::name and remove Primitive::print (#2365 )

2025-07-14 14:06:35 -07:00

Remove static initializers (#2059 )

2025-04-24 06:14:49 -07:00

fix pinv (#2110 )

2025-04-23 13:08:28 -07:00

allocator.cpp

Add stats and limit to common allocator and enable tests (#1988 )

2025-03-21 12:28:36 -07:00

allocator.h

Add stats and limit to common allocator and enable tests (#1988 )

2025-03-21 12:28:36 -07:00

array.cpp

reduce binary size (#1952 )

2025-03-11 06:30:44 -07:00

array.h

Add complex eigh (#2191 )

2025-05-18 00:18:43 -07:00

CMakeLists.txt

start cuda circle config (#2256 )

2025-06-10 21:19:47 -07:00

compile_impl.h

Simplify removes no-ops from the tape (#1759 )

2025-01-09 11:23:19 -08:00

compile.cpp

Add Primitive::name and remove Primitive::print (#2365 )

2025-07-14 14:06:35 -07:00

compile.h

fix function pointer (#1865 )

2025-02-13 18:46:11 -08:00

device.cpp

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

device.h

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

dtype_utils.cpp

MLX_SWITCH macros to templates (#2320 )

2025-07-01 01:33:44 -07:00

dtype_utils.h

MLX_SWITCH macros to templates (#2320 )

2025-07-01 01:33:44 -07:00

dtype.cpp

fix double type promotion (#1901 )

2025-02-25 06:00:53 -08:00

dtype.h

Fp64 on the CPU (#1843 )

2025-02-07 15:52:22 -08:00

einsum.cpp

Einsum ellipsis (#1788 )

2025-01-25 01:28:03 -08:00

einsum.h

Einsum (#1269 )

2024-07-25 09:36:44 -07:00

event.h

Remove Event::Signal() (#2052 )

2025-04-08 06:20:27 -07:00

export_impl.h

Export / import functions to / from a file (#1642 )

2024-12-24 11:19:13 -08:00

export.cpp

Add Primitive::name and remove Primitive::print (#2365 )

2025-07-14 14:06:35 -07:00

export.h

Use unordered map for kwargs in export/import (#2087 )

2025-04-21 07:17:22 -07:00

fast_primitives.h

Add Primitive::name and remove Primitive::print (#2365 )

2025-07-14 14:06:35 -07:00

fast.cpp

full row mask in sdpa consistently gives nan (#2406 )

2025-07-23 16:37:03 -07:00

fast.h

Add new sdpa function overload (#2035 )

2025-04-03 11:58:28 -07:00

fence.h

redesign for faster cpu/gpu synch (#1869 )

2025-03-06 19:23:38 -08:00

fft.cpp

add fftshift and ifftshift fft helpers (#2135 )

2025-04-29 22:13:45 -07:00

fft.h

add fftshift and ifftshift fft helpers (#2135 )

2025-04-29 22:13:45 -07:00

graph_utils.cpp

Add Primitive::name and remove Primitive::print (#2365 )

2025-07-14 14:06:35 -07:00

graph_utils.h

Optionally specify names for arrays when exporting (#1749 )

2025-01-06 13:07:46 -08:00

io.h

Added missing unordered_map includes (#1635 )

2024-12-02 07:03:03 -08:00

linalg.cpp

Add Primitive::name and remove Primitive::print (#2365 )

2025-07-14 14:06:35 -07:00

linalg.h

non-symmetric eig and eigh (#2188 )

2025-05-15 13:01:44 -07:00

memory.h

move memory APIs into top level mlx.core (#1982 )

2025-03-21 07:25:12 -07:00

mlx.h

start cuda circle config (#2256 )

2025-06-10 21:19:47 -07:00

ops.cpp

MoE backward improvements (#2335 )

2025-07-07 17:59:53 -07:00

ops.h

MoE backward improvements (#2335 )

2025-07-07 17:59:53 -07:00

primitives.cpp

Remove unused code in Convolution::vjp (#2408 )

2025-07-23 06:11:00 -07:00

primitives.h

Add Primitive::name and remove Primitive::print (#2365 )

2025-07-14 14:06:35 -07:00

random.cpp

lower memory uniform sampling (#2361 )

2025-07-15 14:22:07 -07:00

random.h

Add random normal distribution for complex numbers (#2182 )

2025-05-13 22:43:45 -07:00

scheduler.cpp

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

scheduler.h

Generalize gpu backend (#2138 )

2025-04-30 09:08:17 -07:00

stream.h

Export / import functions to / from a file (#1642 )

2024-12-24 11:19:13 -08:00

threadpool.h

Ring distributed backend (#1784 )

2025-01-27 22:15:01 -08:00

transforms_impl.h

Remove static initializers (#2059 )

2025-04-24 06:14:49 -07:00

transforms.cpp

[Metal] Release metal events (#2412 )

2025-07-23 19:53:42 -07:00

transforms.h

Export / import functions to / from a file (#1642 )

2024-12-24 11:19:13 -08:00

utils.cpp

MLX_SWITCH macros to templates (#2320 )

2025-07-01 01:33:44 -07:00

utils.h

Fix cuda arg reduce (#2291 )

2025-06-14 17:54:00 -07:00

version.cpp

Do not define MLX_VERSION globally (#1966 )

2025-03-18 07:12:40 -07:00

version.h

fix release build + patch bump (#2387 )

2025-07-18 14:47:37 -07:00