Add dispatchThreads to custom kernel doc (#1551)

* add dispatchThreads info

* update

* add link
This commit is contained in:
Alex Barron
2024-11-01 13:07:48 -07:00
committed by GitHub
parent eac961ddb1
commit 9e516b71ea
2 changed files with 10 additions and 0 deletions

View File

@@ -1,3 +1,5 @@
.. _custom_metal_kernels:
Custom Metal Kernels
====================
@@ -76,6 +78,10 @@ Putting this all together, the generated function signature for ``myexp`` is as
template [[host_name("custom_kernel_myexp_float")]] [[kernel]] decltype(custom_kernel_myexp_float<float>) custom_kernel_myexp_float<float>;
Note: ``grid`` and ``threadgroup`` are parameters to the Metal `dispatchThreads <https://developer.apple.com/documentation/metal/mtlcomputecommandencoder/2866532-dispatchthreads>`_ function.
This means we will launch ``mx.prod(grid)`` threads, subdivided into ``threadgroup`` size threadgroups.
For optimal performance, each thread group dimension should be less than or equal to the corresponding grid dimension.
Passing ``verbose=True`` to ``mx.fast.metal_kernel.__call__`` will print the generated code for debugging purposes.
Using Shape/Strides