Files
mlx/docs/build/doctrees/python/nn/_autosummary/mlx.nn.MultiHeadAttention.doctree

74 lines
32 KiB
Plaintext
Raw Normal View History

2024-01-17 17:15:29 -08:00
<EFBFBD><05>^<00>sphinx.addnodes<65><73>document<6E><74><EFBFBD>)<29><>}<7D>(<28> rawsource<63><65><00><>children<65>]<5D><>docutils.nodes<65><73>section<6F><6E><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09>title<6C><65><EFBFBD>)<29><>}<7D>(h<05>mlx.nn.MultiHeadAttention<6F>h]<5D>h <09>Text<78><74><EFBFBD><EFBFBD>mlx.nn.MultiHeadAttention<6F><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(<28>parent<6E>h<11> _document<6E>h<03>source<63>N<EFBFBD>line<6E>Nuba<62>
attributes<EFBFBD>}<7D>(<28>ids<64>]<5D><>classes<65>]<5D><>names<65>]<5D><>dupnames<65>]<5D><>backrefs<66>]<5D>u<EFBFBD>tagname<6D>hhh hhh<1D>Y/Users/awnihannun/repos/mlx/docs/src/python/nn/_autosummary/mlx.nn.MultiHeadAttention.rst<73>hKubh<00>index<65><78><EFBFBD>)<29><>}<7D>(hhh]<5D>h}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><>entries<65>]<5D>(<28>single<6C><65>$MultiHeadAttention (class in mlx.nn)<29><>mlx.nn.MultiHeadAttention<6F>hNt<4E>auh+h-hh hhhNhNubh<00>desc<73><63><EFBFBD>)<29><>}<7D>(hhh]<5D>(h<00>desc_signature<72><65><EFBFBD>)<29><>}<7D>(hX*MultiHeadAttention(dims: int, num_heads: int, query_input_dims: ~typing.Optional[int] = None, key_input_dims: ~typing.Optional[int] = None, value_input_dims: ~typing.Optional[int] = None, value_dims: ~typing.Optional[int] = None, value_output_dims: ~typing.Optional[int] = None, bias: bool = False)<29>h]<5D>(h<00>desc_annotation<6F><6E><EFBFBD>)<29><>}<7D>(h<05>2[<#text: 'class'>, <desc_sig_space: <#text: ' '>>]<5D>h]<5D>(h<16>class<73><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhKhhhNhNubh<00>desc_sig_space<63><65><EFBFBD>)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhUhhhNhNubah}<7D>(h!]<5D>h#]<5D><>w<>ah%]<5D>h']<5D>h)]<5D>uh+hShhKubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> xml:space<63><65>preserve<76>uh+hIhhEhhh<1D>y/Users/awnihannun/repos/mlx/python/mlx/nn/layers/transformer.py:docstring of mlx.nn.layers.transformer.MultiHeadAttention<6F>hNubh<00> desc_addname<6D><65><EFBFBD>)<29><>}<7D>(h<05>mlx.nn.<2E>h]<5D>h<16>mlx.nn.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhohhhNhNubah}<7D>(h!]<5D>h#]<5D>(<28> sig-prename<6D><65> descclassname<6D>eh%]<5D>h']<5D>h)]<5D>hjhkuh+hmhhEhhhhlhNubh<00> desc_name<6D><65><EFBFBD>)<29><>}<7D>(h<05>MultiHeadAttention<6F>h]<5D>h<16>MultiHeadAttention<6F><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>(<28>sig-name<6D><65>descname<6D>eh%]<5D>h']<5D>h)]<5D>hjhkuh+hhhEhhhhlhNubh<00>desc_parameterlist<73><74><EFBFBD>)<29><>}<7D>(hXdims: int, num_heads: int, query_input_dims: ~typing.Optional[int] = None, key_input_dims: ~typing.Optional[int] = None, value_input_dims: ~typing.Optional[int] = None, value_dims: ~typing.Optional[int] = None, value_output_dims: ~typing.Optional[int] = None, bias: bool = False<73>h]<5D>(h<00>desc_parameter<65><72><EFBFBD>)<29><>}<7D>(h<05> dims: intint<6E>h]<5D>(h<00> desc_sig_name<6D><65><EFBFBD>)<29><>}<7D>(h<05>dims<6D>h]<5D>h<16>dims<6D><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubah}<7D>(h!]<5D>h#]<5D><>n<>ah%]<5D>h']<5D>h)]<5D>uh+h<>hh<>ubh<00>desc_sig_punctuation<6F><6E><EFBFBD>)<29><>}<7D>(h<05>:<3A>h]<5D>h<16>:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubah}<7D>(h!]<5D>h#]<5D><>p<>ah%]<5D>h']<5D>h)]<5D>uh+h<>hh<>ubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShh<>ubh<62>)<29><>}<7D>(h<05>intint<6E>h]<5D>h<00> pending_xref<65><66><EFBFBD>)<29><>}<7D>(hhh]<5D>(h<00>pending_xref_condition<6F><6E><EFBFBD>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F><6E>resolved<65>uh+h<>hh<>ubh<62>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F><6E>*<2A>uh+h<>hh<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69><6E>py<70><79>reftype<70><65>class<73><73> reftarget<65><74>int<6E><74> refspecific<69><63><EFBFBD> py:module<6C><65>mlx.nn<6E><6E>py:class<73>Nuh+h<>hh<>ubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hh<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>hjhkuh+h<>hh<>ubh<62>)<29><>}<7D>(h<05>num_heads: intint<6E>h]<5D>(h<>)<29><>}<7D>(h<05> num_heads<64>h]<5D>h<16> num_heads<64><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hjubh<62>)<29><>}<7D>(h<05>:<3A>h]<5D>h<16>:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj&hhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hjubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj4hhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShjubh<62>)<29><>}<7D>(h<05>intint<6E>h]<5D>h<EFBFBD>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjIhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hjFubh<62>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjXhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hjFubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>h<EFBFBD><68>reftype<70>j<00> reftarget<65><74>int<6E><74> refspecific<69><63><EFBFBD> py:module<6C>j<00>py:class<73>Nuh+h<>hjBubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hjubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>hjhkuh+h<>hh<>ubh<62>)<29><>}<7D>(h<05>1query_input_dims: OptionalOptional[intint] = None<6E>h]<5D>(h<>)<29><>}<7D>(h<05>query_input_dims<6D>h]<5D>h<16>query_input_dims<6D><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hj<>ubh<62>)<29><>}<7D>(h<05>:<3A>h]<5D>h<16>:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hj<>ubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShj<>ubh<62>)<29><>}<7D>(h<05>OptionalOptional[intint]<5D>h]<5D>(h<>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>Optional<61>h]<5D>h<16>Optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubh<62>)<29><>}<7D>(h<05>Optional<61>h]<5D>h<16>Optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNu
value_dims<EFBFBD>h]<5D>h<16>
value_dims<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjXhhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hjTubh<62>)<29><>}<7D>(h<05>:<3A>h]<5D>h<16>:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjfhhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hjTubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjthhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShjTubh<62>)<29><>}<7D>(h<05>OptionalOptional[intint]<5D>h]<5D>(h<>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>Optional<61>h]<5D>h<16>Optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubh<62>)<29><>}<7D>(h<05>Optional<61>h]<5D>h<16>Optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>h<EFBFBD><68>reftype<70>j<EFBFBD><00> reftarget<65><74>typing.Optional<61><6C> refspecific<69><63><EFBFBD> py:module<6C>j<00>py:class<73>Nuh+h<>hj<>ubh<62>)<29><>}<7D>(h<05>[<5B>h]<5D>h<16>[<5B><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hj<>ubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubh<62>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>h<EFBFBD><68>reftype<70>j<00> reftarget<65><74>int<6E><74> refspecific<69><63><EFBFBD> py:module<6C>j<00>py:class<73>Nuh+h<>hj<>ubh<62>)<29><>}<7D>(h<05>]<5D>h]<5D>h<16>]<5D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hj<>ubeh}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hjTubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShjTubj@)<29><>}<7D>(h<05>=<3D>h]<5D>h<16>=<3D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubah}<7D>(h!]<5D>h#]<5D>jLah%]<5D>h']<5D>h)]<5D>uh+j?hjTubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShjTubj_)<29><>}<7D>(h<05>None<6E>h]<5D>h<16>None<6E><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj.hhhNhNubah}<7D>(h!]<5D>h#]<5D>jkah%]<5D>h']<5D>h)]<5D><>support_smartquotes<65><73>uh+j^hjTubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>hjhkuh+h<>hh<>ubh<62>)<29><>}<7D>(h<05>2value_output_dims: OptionalOptional[intint] = None<6E>h]<5D>(h<>)<29><>}<7D>(h<05>value_output_dims<6D>h]<5D>h<16>value_output_dims<6D><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjGhhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hjCubh<62>)<29><>}<7D>(h<05>:<3A>h]<5D>h<16>:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjUhhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hjCubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjchhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShjCubh<62>)<29><>}<7D>(h<05>OptionalOptional[intint]<5D>h]<5D>(h<>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>Optional<61>h]<5D>h<16>Optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjxhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hjuubh<62>)<29><>}<7D>(h<05>Optional<61>h]<5D>h<16>Optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hjuubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>h<EFBFBD><68>reftype<70>j<EFBFBD><00> reftarget<65><74>typing.Optional<61><6C> refspecific<69><63><EFBFBD> py:module<6C>j<00>py:class<73>Nuh+h<>hjqubh<62>)<29><>}<7D>(h<05>[<5B>h]<5D>h<16>[<5B><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hjqubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubh<62>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>h<EFBFBD><68>reftype<70>j<00> reftarget<65><74>int<6E><74> refspecific<69><63><EFBFBD> py:module<6C>j<00>py:class<73>Nuh+h<>hjqubh<62>)<29><>}<7D>(h<05>]<5D>h]<5D>h<16>]<5D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hjqubeh}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hjCubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShjCubj@)<29><>}<7D>(h<05>=<3D>h]<5D>h<16>=<3D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubah}<7D>(h!]<5D>h#]<5D>jLah%]<5D>h']<5D>h)]<5D>uh+j?hjCubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShjCubj_)<29><>}<7D>(h<05>None<6E>h]<5D>h<16>None<6E><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubah}<7D>(h!]<5D>h#]<5D>jkah%]<5D>h']<5D>h)]<5D><>support_smartquotes<65><73>uh+j^hjCubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>hjhkuh+h<>hh<>ubh<62>)<29><>}<7D>(h<05>bias: boolbool = False<73>h]<5D>(h<>)<29><>}<7D>(h<05>bias<61>h]<5D>h<16>bias<61><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj6hhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hj2ubh<62>)<29><>}<7D>(h<05>:<3A>h]<5D>h<16>:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjDhhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hj2ubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjRhhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShj2ubh<62>)<29><>}<7D>(h<05>boolbool<6F>h]<5D>h<EFBFBD>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>bool<6F>h]<5D>h<16>bool<6F><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjghhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>
sig-object<63>eh%]<5D>h']<5D>h)]<5D><>module<6C><65>mlx.nn<6E>jh<06>fullname<6D>h<EFBFBD>uh+hCh<1D>y/Users/awnihannun/repos/mlx/python/mlx/nn/layers/transformer.py:docstring of mlx.nn.layers.transformer.MultiHeadAttention<6F>hKhh@hhubh<00> desc_content<6E><74><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09> paragraph<70><68><EFBFBD>)<29><>}<7D>(h<05>@Implements the scaled dot product attention with multiple heads.<2E>h]<5D>h<16>@Implements the scaled dot product attention with multiple heads.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>h<1D>y/Users/awnihannun/repos/mlx/python/mlx/nn/layers/transformer.py:docstring of mlx.nn.layers.transformer.MultiHeadAttention<6F>hKhj<>hhubj<62>)<29><>}<7D>(h<05><>Given inputs for queries, keys and values the ``MultiHeadAttention``
produces new values by aggregating information from the input values
according to the similarities of the input queries and keys.<2E>h]<5D>(h<16>.Given inputs for queries, keys and values the <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh <09>literal<61><6C><EFBFBD>)<29><>}<7D>(h<05>``MultiHeadAttention``<60>h]<5D>h<16>MultiHeadAttention<6F><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj<>ubh<16><>
produces new values by aggregating information from the input values
according to the similarities of the input queries and keys.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hKhj<>hhubj<62>)<29><>}<7D>(h<05>RAll inputs as well as the output are linearly projected without biases by
default.<2E>h]<5D>h<16>RAll inputs as well as the output are linearly projected without biases by
default.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj!hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hKhj<>hhubj<62>)<29><>}<7D>(hX``MultiHeadAttention`` also takes an optional additive attention mask that
should be broadcastable with ``(batch, num_heads, # queries, # keys)``. The
mask should have ``-inf`` or very large negative numbers at the positions
that should *not* be attended to.<2E>h]<5D>(j)<29><>}<7D>(h<05>``MultiHeadAttention``<60>h]<5D>h<16>MultiHeadAttention<6F><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj3hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj/ubh<16>R also takes an optional additive attention mask that
should be broadcastable with <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj/hhhNhNubj)<29><>}<7D>(h<05>)``(batch, num_heads, # queries, # keys)``<60>h]<5D>h<16>%(batch, num_heads, # queries, # keys)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjEhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj/ubh<16>. The
mask should have <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj/hhhNhNubj)<29><>}<7D>(h<05>``-inf``<60>h]<5D>h<16>-inf<6E><66><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjWhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj/ubh<16>= or very large negative numbers at the positions
that should <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj/hhhNhNubh <09>emphasis<69><73><EFBFBD>)<29><>}<7D>(h<05>*not*<2A>h]<5D>h<16>not<6F><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjkhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jihj/ubh<16> be attended to.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj/hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hK
hj<>hhubh <09>
field_list<EFBFBD><EFBFBD><EFBFBD>)<29><>}<7D>(hhh]<5D>h <09>field<6C><64><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09>
field_name<EFBFBD><EFBFBD><EFBFBD>)<29><>}<7D>(h<05>
Parameters<EFBFBD>h]<5D>h<16>
Parameters<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hhlhKubh <09>
field_body<EFBFBD><EFBFBD><EFBFBD>)<29><>}<7D>(hhh]<5D>h <09> bullet_list<73><74><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09> list_item<65><6D><EFBFBD>)<29><>}<7D>(hhh]<5D>j<EFBFBD>)<29><>}<7D>(h<05>tdims (intint) -- The model dimensions. This is also the default
value for the queries, keys, values, and the output.<2E>h]<5D>(h<00>literal_strong<6E><67><EFBFBD>)<29><>}<7D>(h<05>dims<6D>h]<5D>h<16>dims<6D><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>int<6E>h]<5D>h<00>literal_emphasis<69><73><EFBFBD>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubh<62>)<29><>}<7D>(h<05>int<6E>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69><6E>py<70><79> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j<EFBFBD><00> refspecific<69><63><EFBFBD> py:module<6C>j<EFBFBD><00>py:class<73>h<EFBFBD>uh+h<>hj<>ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16>cThe model dimensions. This is also the default
value for the queries, keys, values, and the output.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubj<62>)<29><>}<7D>(hhh]<5D>j<EFBFBD>)<29><>}<7D>(h<05>;num_heads (intint) -- The number of attention heads to use.<2E>h]<5D>(j<>)<29><>}<7D>(h<05> num_heads<64>h]<5D>h<16> num_heads<64><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj(hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj$ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj$hhhNhNubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>int<6E>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjAhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj=ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj:ubh<62>)<29><>}<7D>(h<05>int<6E>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjZhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hjVubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj:ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j\j<00>jj<>jh<>uh+h<>hj$ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj$hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj$hhhNhNubh<16>%The number of attention heads to use.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj$hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj!ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubj<62>)<29><>}<7D>(hhh]<5D>j<EFBFBD>)<29><>}<7D>(h<05>bquery_input_dims (intint, optionaloptional) -- The input dimensions of the queries.
Default: dims.<2E>h]<5D>(j<>)<29><>}<7D>(h<05>query_input_dims<6D>h]<5D>h<16>query_input_dims<6D><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>int<6E>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubh<62>)<29><>}<7D>(h<05>int<6E>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j<EFBFBD>j<00>jj<>jh<>uh+h<>hj<>ubj<62>)<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>optional<61>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubh<62>)<29><>}<7D>(h<05>optional<61>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j j<00>jj<>jh<>uh+h<>hj<>ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16>.The input dimensions of the queries.
Default: <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubj)<29><>}<7D>(h<05>``dims``<60>h]<5D>h<16>dims<6D><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjB hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj<>hhhNhNubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubj<62>)<29><>}<7D>(hhh]<5D>j<EFBFBD>)<29><>}<7D>(h<05>]key_input_dims (intint, optionaloptional) -- The input dimensions of the keys.
Default: dims.<2E>h]<5D>(j<>)<29><>}<7D>(h<05>key_input_dims<6D>h]<5D>h<16>key_input_dims<6D><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjg hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hjc ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjc hhhNhNubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>int<6E>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj| ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hjy ubh<62>)<29><>}<7D>(h<05>int<6E>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<> ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hjy ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j<EFBFBD> j<00>jj<>jh<>uh+h<>hjc ubj<62>)<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hjc ubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>optional<61>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<> ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<> ubh<62>)<29><>}<7D>(h<05>optional<61>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<> ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<> ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j<EFBFBD> j<00>jj<>jh<>uh+h<>hjc ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjc hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjc hhhNhNubh<16>+The input dimensions of the keys.
Default: <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjc hhhNhNubj)<29><>}<7D>(h<05>``dims``<60>h]<5D>h<16>dims<6D><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj
hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjc hhhNhNubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjc hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj` ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubj<62>)<29><>}<7D>(hhh]<5D>j<EFBFBD>)<29><>}<7D>(h<05>kvalue_input_dims (intint, optionaloptional) -- The input dimensions of the values.
Default: key_input_dims.<2E>h]<5D>(j<>)<29><>}<7D>(h<05>value_input_dims<6D>h]<5D>h<16>value_input_dims<6D><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj6
hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj2
ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj2
hhhNhNubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>int<6E>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjO
hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hjK
ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hjH
ubh<62>)<29><>}<7D>(h<05>int<6E>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjh
hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hjd
ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hjH
ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>jj
j<00>jj<>jh<>uh+h<>hj2
ubj<62>)<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>
hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj2
ubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>optional<61>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>
hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>
ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>
ubh<62>)<29><>}<7D>(h<05>optional<61>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>
hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>
ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>
ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j<EFBFBD>
j<00>jj<>jh<>uh+h<>hj2
ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj2
hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj2
hhhNhNubh<16>-The input dimensions of the values.
Default: <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj2
hhhNhNubj)<29><>}<7D>(h<05>``key_input_dims``<60>h]<5D>h<16>key_input_dims<6D><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>
hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj2
hhhNhNubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj2
hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj/
ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubj<62>)<29><>}<7D>(hhh]<5D>j<EFBFBD>)<29><>}<7D>(h<05>jvalue_dims (intint, optionaloptional) -- The dimensions of the values after the
projection. Default: dims.<2E>h]<5D>(j<>)<29><>}<7D>(h<05>
value_dims<EFBFBD>h]<5D>h<16>
value_dims<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>int<6E>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj ubh<62>)<29><>}<7D>(h<05>int<6E>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj7 hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj3 ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j9 j<00>jj<>jh<>uh+h<>hj ubj<62>)<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjV hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj ubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>optional<61>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjk hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hjg ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hjd ubh<62>)<29><>}<7D>(h<05>optional<61>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<> ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hjd ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j<EFBFBD> j<00>jj<>jh<>uh+h<>hj ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubh<16><The dimensions of the values after the
projection. Default: <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubj)<29><>}<7D>(h<05>``dims``<60>h]<5D>h<16>dims<6D><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj hhhNhNubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>
ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubj<62>)<29><>}<7D>(hhh]<5D>j<EFBFBD>)<29><>}<7D>(h<05>rvalue_output_dims (intint, optionaloptional) -- The dimensions the new values will
be projected to. Default: dims.<2E>h]<5D>(j<>)<29><>}<7D>(h<05>value_output_dims<6D>h]<5D>h<16>value_output_dims<6D><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<> ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>int<6E>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<> ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<> ubh<62>)<29><>}<7D>(h<05>int<6E>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<> ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j j<00>jj<>jh<>uh+h<>hj<> ubj<62>)<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj% hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<> ubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>optional<61>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj: hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj6 ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj3 ubh<62>)<29><>}<7D>(h<05>optional<61>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjS hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hjO ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj3 ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>jU j<00>jj<>jh<>uh+h<>hj<> ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubh<16>=The dimensions the new values will
be projected to. Default: <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubj)<29><>}<7D>(h<05>``dims``<60>h]<5D>h<16>dims<6D><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj~ hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj<> hhhNhNubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<> ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubj<62>)<29><>}<7D>(hhh]<5D>j<EFBFBD>)<29><>}<7D>(h<05>ebias (boolbool, optionaloptional) -- Whether or not to use a bias in the projections.
Default: False.<2E>h]<5D>(j<>)<29><>}<7D>(h<05>bias<61>h]<5D>h<16>bias<61><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<> ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>bool<6F>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>bool<6F>h]<5D>h<16>bool<6F><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<> ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<> ubh<62>)<29><>}<7D>(h<05>bool<6F>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>bool<6F>h]<5D>h<16>bool<6F><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<> ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<> ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j<EFBFBD> j<00>jj<>jh<>uh+h<>hj<> ubj<62>)<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<> ubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>optional<61>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj ubh<62>)<29><>}<7D>(h<05>optional<61>h]<5D>j<EFBFBD>)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj" hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j$ j<00>jj<>jh<>uh+h<>hj<> ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubh<16>:Whether or not to use a bias in the projections.
Default: <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubj)<29><>}<7D>(h<05> ``False``<60>h]<5D>h<16>False<73><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjM hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj<> hhhNhNubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<> hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<> ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hh@hhhhlhNubeh}<7D>(h!]<5D>h#]<5D>(j<00>class<73>eh%]<5D>h']<5D>h)]<5D><>domain<69>j<00>objtype<70>j<EFBFBD> <00>desctype<70>j<EFBFBD> <00>noindex<65><78>uh+h>hhhh hNhNubeh}<7D>(h!]<5D><>mlx-nn-multiheadattention<6F>ah#]<5D>h%]<5D><>mlx.nn.multiheadattention<6F>ah']<5D>h)]<5D>uh+h
hhhhhh,hKubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><>source<63>h,uh+h<01>current_source<63>N<EFBFBD> current_line<6E>N<EFBFBD>settings<67><73>docutils.frontend<6E><64>Values<65><73><EFBFBD>)<29><>}<7D>(hN<> generator<6F>N<EFBFBD> datestamp<6D>N<EFBFBD> source_link<6E>N<EFBFBD>
source_url<EFBFBD>N<EFBFBD> toc_backlinks<6B><73>entry<72><79>footnote_backlinks<6B>K<01> sectnum_xform<72>K<01>strip_comments<74>N<EFBFBD>strip_elements_with_classes<65>N<EFBFBD> strip_classes<65>N<EFBFBD> report_level<65>K<02>
halt_level<EFBFBD>K<05>exit_status_level<65>K<05>debug<75>N<EFBFBD>warning_stream<61>N<EFBFBD> traceback<63><6B><EFBFBD>input_encoding<6E><67> utf-8-sig<69><67>input_encoding_error_handler<65><72>strict<63><74>output_encoding<6E><67>utf-8<><38>output_encoding_error_handler<65>j<EFBFBD> <00>error_encoding<6E><67>utf-8<><38>error_encoding_error_handler<65><72>backslashreplace<63><65> language_code<64><65>en<65><6E>record_dependencies<65>N<EFBFBD>config<69>N<EFBFBD> id_prefix<69>h<06>auto_id_prefix<69><78>id<69><64> dump_settings<67>N<EFBFBD>dump_internals<6C>N<EFBFBD>dump_transforms<6D>N<EFBFBD>dump_pseudo_xml<6D>N<EFBFBD>expose_internals<6C>N<EFBFBD>strict_visitor<6F>N<EFBFBD>_disable_config<69>N<EFBFBD>_source<63>h,<2C> _destination<6F>N<EFBFBD> _config_files<65>]<5D><>file_insertion_enabled<65><64><EFBFBD> raw_enabled<65>K<01>line_length_limit<69>M'<27>pep_references<65>N<EFBFBD> pep_base_url<72><6C>https://peps.python.org/<2F><>pep_file_url_template<74><65>pep-%04d<34><64>rfc_references<65>N<EFBFBD> rfc_base_url<72><6C>&https://datatracker.ietf.org/doc/html/<2F><> tab_width<74>K<08>trim_footnote_reference_space<63><65><EFBFBD>syntax_highlight<68><74>long<6E><67> smart_quotes<65><73><EFBFBD>smartquotes_locales<65>]<5D><>character_level_inline_markup<75><70><EFBFBD>doctitle_xform<72><6D><EFBFBD> docinfo_xform<72>K<01>sectsubtitle_xform<72><6D><EFBFBD> image_loading<6E><67>link<6E><6B>embed_stylesheet<65><74><EFBFBD>cloak_email_addresses<65><73><EFBFBD>section_self_link<6E><6B><EFBFBD>env<6E>Nub<75>reporter<65>N<EFBFBD>indirect_targets<74>]<5D><>substitution_defs<66>}<7D><>substitution_names<65>}<7D><>refnames<65>}<7D><>refids<64>}<7D><>nameids<64>}<7D>j<EFBFBD> j<> s<> nametypes<65>}<7D>j<EFBFBD> <00>sh!}<7D>(j<> h h<hEu<45> footnote_refs<66>}<7D><> citation_refs<66>}<7D><> autofootnotes<65>]<5D><>autofootnote_refs<66>]<5D><>symbol_footnotes<65>]<5D><>symbol_footnote_refs<66>]<5D><> footnotes<65>]<5D><> citations<6E>]<5D><>autofootnote_start<72>K<01>symbol_footnote_start<72>K<00>
id_counter<EFBFBD><EFBFBD> collections<6E><73>Counter<65><72><EFBFBD>}<7D><><EFBFBD>R<EFBFBD><52>parse_messages<65>]<5D><>transform_messages<65>]<5D><> transformer<65>N<EFBFBD> include_log<6F>]<5D><>
decoration<EFBFBD>Nhhub.