post nanobind docs fixes and some updates (#889)

* post nanobind docs fixes and some updates * one more doc nit * fix for stubs and latex
2025-12-15 09:29:26 +08:00 · 2024-03-24 15:03:27 -07:00
parent be98f4ab6b
commit 1e16331d9c
16 changed files with 185 additions and 118 deletions
--- a/python/src/fast.cpp
+++ b/python/src/fast.cpp
@@ -118,12 +118,13 @@ void init_fast(nb::module_& parent_module) {
        A fast implementation of multi-head attention: ``O = softmax(Q @ K.T, dim=-1) @ V``.

        Supports:
-        * [Multi-Head Attention](https://arxiv.org/abs/1706.03762)
-        * [Grouped Query Attention](https://arxiv.org/abs/2305.13245)
-        * [Multi-Query Attention](https://arxiv.org/abs/1911.02150).
+
+        * `Multi-Head Attention <https://arxiv.org/abs/1706.03762>`_
+        * `Grouped Query Attention <https://arxiv.org/abs/2305.13245>`_
+        * `Multi-Query Attention <https://arxiv.org/abs/1911.02150>`_

        Note: The softmax operation is performed in ``float32`` regardless of
-        input precision.
+        the input precision.

        Note: For Grouped Query Attention and Multi-Query Attention, the ``k``
        and ``v`` inputs should not be pre-tiled to match ``q``.