mirror of
https://github.com/ml-explore/mlx.git
synced 2025-12-15 09:29:26 +08:00
post nanobind docs fixes and some updates (#889)
* post nanobind docs fixes and some updates * one more doc nit * fix for stubs and latex
This commit is contained in:
@@ -118,12 +118,13 @@ void init_fast(nb::module_& parent_module) {
|
||||
A fast implementation of multi-head attention: ``O = softmax(Q @ K.T, dim=-1) @ V``.
|
||||
|
||||
Supports:
|
||||
* [Multi-Head Attention](https://arxiv.org/abs/1706.03762)
|
||||
* [Grouped Query Attention](https://arxiv.org/abs/2305.13245)
|
||||
* [Multi-Query Attention](https://arxiv.org/abs/1911.02150).
|
||||
|
||||
* `Multi-Head Attention <https://arxiv.org/abs/1706.03762>`_
|
||||
* `Grouped Query Attention <https://arxiv.org/abs/2305.13245>`_
|
||||
* `Multi-Query Attention <https://arxiv.org/abs/1911.02150>`_
|
||||
|
||||
Note: The softmax operation is performed in ``float32`` regardless of
|
||||
input precision.
|
||||
the input precision.
|
||||
|
||||
Note: For Grouped Query Attention and Multi-Query Attention, the ``k``
|
||||
and ``v`` inputs should not be pre-tiled to match ``q``.
|
||||
|
||||
Reference in New Issue
Block a user