auto build linux release (#2341)

This commit is contained in:
Awni Hannun 2025-07-07 09:29:23 -07:00 committed by GitHub
parent 9d10239af7
commit a4fcc893cd
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 37 additions and 9 deletions

View File

@ -492,6 +492,16 @@ workflows:
branches: branches:
ignore: /.*/ ignore: /.*/
upload-docs: true upload-docs: true
- build_linux_release:
filters:
tags:
only: /^v.*/
branches:
ignore: /.*/
matrix:
parameters:
python_version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
extra_env: ["PYPI_RELEASE=1"]
prb: prb:
when: when:

View File

@ -175,11 +175,12 @@ void init_fast(nb::module_& parent_module) {
* `Grouped Query Attention <https://arxiv.org/abs/2305.13245>`_ * `Grouped Query Attention <https://arxiv.org/abs/2305.13245>`_
* `Multi-Query Attention <https://arxiv.org/abs/1911.02150>`_ * `Multi-Query Attention <https://arxiv.org/abs/1911.02150>`_
Note: The softmax operation is performed in ``float32`` regardless of .. note::
the input precision.
Note: For Grouped Query Attention and Multi-Query Attention, the ``k`` * The softmax operation is performed in ``float32`` regardless of
and ``v`` inputs should not be pre-tiled to match ``q``. the input precision.
* For Grouped Query Attention and Multi-Query Attention, the ``k``
and ``v`` inputs should not be pre-tiled to match ``q``.
In the following the dimensions are given by: In the following the dimensions are given by:
@ -195,13 +196,30 @@ void init_fast(nb::module_& parent_module) {
k (array): Keys with shape ``[B, N_kv, T_kv, D]``. k (array): Keys with shape ``[B, N_kv, T_kv, D]``.
v (array): Values with shape ``[B, N_kv, T_kv, D]``. v (array): Values with shape ``[B, N_kv, T_kv, D]``.
scale (float): Scale for queries (typically ``1.0 / sqrt(q.shape(-1)``) scale (float): Scale for queries (typically ``1.0 / sqrt(q.shape(-1)``)
mask (Union[None, str, array], optional): A causal, boolean or additive mask (Union[None, str, array], optional): The mask to apply to the
mask to apply to the query-key scores. The mask can have at most 4 query-key scores. The mask can be an array or a string indicating
dimensions and must be broadcast-compatible with the shape the mask type. The only supported string type is ``"causal"``. If
``[B, N, T_q, T_kv]``. If an additive mask is given its type must the mask is an array it can be a boolean or additive mask. The mask
promote to the promoted type of ``q``, ``k``, and ``v``. can have at most 4 dimensions and must be broadcast-compatible with
the shape ``[B, N, T_q, T_kv]``. If an additive mask is given its
type must promote to the promoted type of ``q``, ``k``, and ``v``.
Returns: Returns:
array: The output array. array: The output array.
Example:
.. code-block:: python
B = 2
N_q = N_kv = 32
T_q = T_kv = 1000
D = 128
q = mx.random.normal(shape=(B, N_q, T_q, D))
k = mx.random.normal(shape=(B, N_kv, T_kv, D))
v = mx.random.normal(shape=(B, N_kv, T_kv, D))
scale = D ** -0.5
out = mx.fast.scaled_dot_product_attention(q, k, v, scale=scale, mask="causal")
)pbdoc"); )pbdoc");
m.def( m.def(