post nanobind docs fixes and some updates (#889)

* post nanobind docs fixes and some updates

* one more doc nit

* fix for stubs and latex
This commit is contained in:
Awni Hannun
2024-03-24 15:03:27 -07:00
committed by GitHub
parent be98f4ab6b
commit 1e16331d9c
16 changed files with 185 additions and 118 deletions

View File

@@ -118,12 +118,13 @@ void init_fast(nb::module_& parent_module) {
A fast implementation of multi-head attention: ``O = softmax(Q @ K.T, dim=-1) @ V``.
Supports:
* [Multi-Head Attention](https://arxiv.org/abs/1706.03762)
* [Grouped Query Attention](https://arxiv.org/abs/2305.13245)
* [Multi-Query Attention](https://arxiv.org/abs/1911.02150).
* `Multi-Head Attention <https://arxiv.org/abs/1706.03762>`_
* `Grouped Query Attention <https://arxiv.org/abs/2305.13245>`_
* `Multi-Query Attention <https://arxiv.org/abs/1911.02150>`_
Note: The softmax operation is performed in ``float32`` regardless of
input precision.
the input precision.
Note: For Grouped Query Attention and Multi-Query Attention, the ``k``
and ``v`` inputs should not be pre-tiled to match ``q``.