Files
mlx/docs/build/doctrees/python/nn/_autosummary/mlx.nn.Transformer.doctree

54 lines
26 KiB
Plaintext
Raw Normal View History

2024-01-17 17:15:29 -08:00
<EFBFBD><05><>i<00>sphinx.addnodes<65><73>document<6E><74><EFBFBD>)<29><>}<7D>(<28> rawsource<63><65><00><>children<65>]<5D><>docutils.nodes<65><73>section<6F><6E><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09>title<6C><65><EFBFBD>)<29><>}<7D>(h<05>mlx.nn.Transformer<65>h]<5D>h <09>Text<78><74><EFBFBD><EFBFBD>mlx.nn.Transformer<65><72><EFBFBD><EFBFBD><EFBFBD>}<7D>(<28>parent<6E>h<11> _document<6E>h<03>source<63>N<EFBFBD>line<6E>Nuba<62>
attributes<EFBFBD>}<7D>(<28>ids<64>]<5D><>classes<65>]<5D><>names<65>]<5D><>dupnames<65>]<5D><>backrefs<66>]<5D>u<EFBFBD>tagname<6D>hhh hhh<1D>R/Users/awnihannun/repos/mlx/docs/src/python/nn/_autosummary/mlx.nn.Transformer.rst<73>hKubh<00>index<65><78><EFBFBD>)<29><>}<7D>(hhh]<5D>h}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><>entries<65>]<5D>(<28>single<6C><65>Transformer (class in mlx.nn)<29><>mlx.nn.Transformer<65>hNt<4E>auh+h-hh hhhNhNubh<00>desc<73><63><EFBFBD>)<29><>}<7D>(hhh]<5D>(h<00>desc_signature<72><65><EFBFBD>)<29><>}<7D>(hXzTransformer(dims: int = 512, num_heads: int = 8, num_encoder_layers: int = 6, num_decoder_layers: int = 6, mlp_dims: ~typing.Optional[int] = None, dropout: float = 0.0, activation: ~typing.Callable[[~typing.Any], ~typing.Any] = <function relu>, custom_encoder: ~typing.Optional[~typing.Any] = None, custom_decoder: ~typing.Optional[~typing.Any] = None, norm_first: bool = False)<29>h]<5D>(h<00>desc_annotation<6F><6E><EFBFBD>)<29><>}<7D>(h<05>2[<#text: 'class'>, <desc_sig_space: <#text: ' '>>]<5D>h]<5D>(h<16>class<73><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhKhhhNhNubh<00>desc_sig_space<63><65><EFBFBD>)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhUhhhNhNubah}<7D>(h!]<5D>h#]<5D><>w<>ah%]<5D>h']<5D>h)]<5D>uh+hShhKubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> xml:space<63><65>preserve<76>uh+hIhhEhhh<1D>r/Users/awnihannun/repos/mlx/python/mlx/nn/layers/transformer.py:docstring of mlx.nn.layers.transformer.Transformer<65>hNubh<00> desc_addname<6D><65><EFBFBD>)<29><>}<7D>(h<05>mlx.nn.<2E>h]<5D>h<16>mlx.nn.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhohhhNhNubah}<7D>(h!]<5D>h#]<5D>(<28> sig-prename<6D><65> descclassname<6D>eh%]<5D>h']<5D>h)]<5D>hjhkuh+hmhhEhhhhlhNubh<00> desc_name<6D><65><EFBFBD>)<29><>}<7D>(h<05> Transformer<65>h]<5D>h<16> Transformer<65><72><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>(<28>sig-name<6D><65>descname<6D>eh%]<5D>h']<5D>h)]<5D>hjhkuh+hhhEhhhhlhNubh<00>desc_parameterlist<73><74><EFBFBD>)<29><>}<7D>(hXo(dims: int = 512, num_heads: int = 8, num_encoder_layers: int = 6, num_decoder_layers: int = 6, mlp_dims: ~typing.Optional[int] = None, dropout: float = 0.0, activation: ~typing.Callable[[~typing.Any], ~typing.Any] = <function relu>, custom_encoder: ~typing.Optional[~typing.Any] = None, custom_decoder: ~typing.Optional[~typing.Any] = None, norm_first: bool = False)<29>h]<5D>h<00>desc_parameter<65><72><EFBFBD>)<29><>}<7D>(hXmdims: int = 512, num_heads: int = 8, num_encoder_layers: int = 6, num_decoder_layers: int = 6, mlp_dims: ~typing.Optional[int] = None, dropout: float = 0.0, activation: ~typing.Callable[[~typing.Any], ~typing.Any] = <function relu>, custom_encoder: ~typing.Optional[~typing.Any] = None, custom_decoder: ~typing.Optional[~typing.Any] = None, norm_first: bool = False<73>h]<5D>hXmdims: int = 512, num_heads: int = 8, num_encoder_layers: int = 6, num_decoder_layers: int = 6, mlp_dims: ~typing.Optional[int] = None, dropout: float = 0.0, activation: ~typing.Callable[[~typing.Any], ~typing.Any] = <function relu>, custom_encoder: ~typing.Optional[~typing.Any] = None, custom_decoder: ~typing.Optional[~typing.Any] = None, norm_first: bool = False<73><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>hjhkuh+h<>hh<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>hjhkuh+h<>hhEhhhhlhNubeh}<7D>(h!]<5D>h<ah#]<5D>(<28>sig<69><67>
sig-object<63>eh%]<5D>h']<5D>h)]<5D><>module<6C><65>mlx.nn<6E><6E>class<73>h<06>fullname<6D>h<EFBFBD>uh+hCh<1D>r/Users/awnihannun/repos/mlx/python/mlx/nn/layers/transformer.py:docstring of mlx.nn.layers.transformer.Transformer<65>hKhh@hhubh<00> desc_content<6E><74><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09> paragraph<70><68><EFBFBD>)<29><>}<7D>(h<05>(Implements a standard Transformer model.<2E>h]<5D>h<16>(Implements a standard Transformer model.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+h<>h<1D>r/Users/awnihannun/repos/mlx/python/mlx/nn/layers/transformer.py:docstring of mlx.nn.layers.transformer.Transformer<65>hKhh<>hhubh<62>)<29><>}<7D>(h<05>_The implementation is based on `Attention Is All You Need
<https://arxiv.org/abs/1706.03762>`_.<2E>h]<5D>(h<16>The implementation is based on <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubh <09> reference<63><65><EFBFBD>)<29><>}<7D>(h<05>?`Attention Is All You Need
<https://arxiv.org/abs/1706.03762>`_<>h]<5D>h<16>Attention Is All You Need<65><64><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><>name<6D><65>Attention Is All You Need<65><64>refuri<72><69> https://arxiv.org/abs/1706.03762<EFBFBD>uh+h<>hh<>ubh <09>target<65><74><EFBFBD>)<29><>}<7D>(h<05>#
<https://arxiv.org/abs/1706.03762><3E>h]<5D>h}<7D>(h!]<5D><>attention-is-all-you-need<65>ah#]<5D>h%]<5D><>attention is all you need<65>ah']<5D>h)]<5D><>refuri<72>h<EFBFBD>uh+h<><68>
referenced<EFBFBD>Khh<>ubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+h<>hh<>hKhh<>hhubh<62>)<29><>}<7D>(h<05><>The Transformer model contains an encoder and a decoder. The encoder
processes the input sequence and the decoder generates the output sequence.
The interaction between encoder and decoder happens through the attention
mechanism.<2E>h]<5D>h<16><>The Transformer model contains an encoder and a decoder. The encoder
processes the input sequence and the decoder generates the output sequence.
The interaction between encoder and decoder happens through the attention
mechanism.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+h<>hh<>hKhh<>hhubh <09>
field_list<EFBFBD><EFBFBD><EFBFBD>)<29><>}<7D>(hhh]<5D>h <09>field<6C><64><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09>
field_name<EFBFBD><EFBFBD><EFBFBD>)<29><>}<7D>(h<05>
Parameters<EFBFBD>h]<5D>h<16>
Parameters<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjhhlhKubh <09>
field_body<EFBFBD><EFBFBD><EFBFBD>)<29><>}<7D>(hhh]<5D>h <09> bullet_list<73><74><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09> list_item<65><6D><EFBFBD>)<29><>}<7D>(hhh]<5D>h<EFBFBD>)<29><>}<7D>(h<05>odims (intint, optionaloptional) -- The number of expected features in the
encoder/decoder inputs. Default: 512.<2E>h]<5D>(h<00>literal_strong<6E><67><EFBFBD>)<29><>}<7D>(h<05>dims<6D>h]<5D>h<16>dims<6D><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjChhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jAhj=ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj=hhhNhNubh<00> pending_xref<65><66><EFBFBD>)<29><>}<7D>(hhh]<5D>(h<00>pending_xref_condition<6F><6E><EFBFBD>)<29><>}<7D>(h<05>int<6E>h]<5D>h<00>literal_emphasis<69><73><EFBFBD>)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjbhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj\ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F><6E>resolved<65>uh+jZhjWubj[)<29><>}<7D>(h<05>int<6E>h]<5D>ja)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj|hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hjxubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F><6E>*<2A>uh+jZhjWubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69><6E>py<70><79> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>j~<00> refspecific<69><63><EFBFBD> py:module<6C>h<EFBFBD><68>py:class<73>h<EFBFBD>uh+jUhj=ubja)<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj=ubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhj<>ubj[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>j<EFBFBD>j<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhj=ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj=hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj=hhhNhNubh<16>HThe number of expected features in the
encoder/decoder inputs. Default: <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj=hhhNhNubh <09>literal<61><6C><EFBFBD>)<29><>}<7D>(h<05>``512``<60>h]<5D>h<16>512<31><32><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj=hhhNhNubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj=hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+h<>hj:ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j8hj5ubj9)<29><>}<7D>(hhh]<5D>h<EFBFBD>)<29><>}<7D>(h<05>Rnum_heads (intint, optionaloptional) -- The number of attention heads. Default:
8.<2E>h]<5D>(jB)<29><>}<7D>(h<05> num_heads<64>h]<5D>h<16> num_heads<64><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jAhjubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>int<6E>h]<5D>ja)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj9hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj5ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhj2ubj[)<29><>}<7D>(h<05>int<6E>h]<5D>ja)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjRhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hjNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhj2ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>jTj<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhjubja)<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjqhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hjubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhjubj[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhjubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>j<EFBFBD>j<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhjubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubh<16>(The number of attention heads. Default:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubj<62>)<29><>}<7D>(h<05>``8``<60>h]<5D>h<16>8<><38><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hjhhhNhNubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+h<>hjubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j8hj5ubj9)<29><>}<7D>(hhh]<5D>h<EFBFBD>)<29><>}<7D>(h<05>unum_encoder_layers (intint, optionaloptional) -- The number of encoder layers in the
Transformer encoder. Default: 6.<2E>h]<5D>(jB)<29><>}<7D>(h<05>num_encoder_layers<72>h]<5D>h<16>num_encoder_layers<72><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jAhj<>ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>int<6E>h]<5D>ja)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hjubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhjubj[)<29><>}<7D>(h<05>int<6E>h]<5D>ja)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj!hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hjubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhjubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>j#j<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhj<>ubja)<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj@hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjUhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hjQubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhjNubj[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjnhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hjjubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhjNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>jpj<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhj<>ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16>BThe number of encoder layers in the
Transformer encoder. Default: <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubj<62>)<29><>}<7D>(h<05>``6``<60>h]<5D>h<16>6<><36><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hhhNhNubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+h<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j8hj5ubj9)<29><>}<7D>(hhh]<5D>h<EFBFBD>)<29><>}<7D>(h<05>unum_decoder_layers (intint, optionaloptional) -- The number of decoder layers in the
Transformer decoder. Default: 6.<2E>h]<5D>(jB)<29><>}<7D>(h<05>num_decoder_layers<72>h]<5D>h<16>num_decoder_layers<72><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jAhj<>ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>int<6E>h]<5D>ja)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhj<>ubj[)<29><>}<7D>(h<05>int<6E>h]<5D>ja)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>j<EFBFBD>j<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhj<>ubja)<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj$hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhjubj[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj=hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj9ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhjubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>j?j<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhj<>ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16>BThe number of decoder layers in the
Transformer decoder. Default: <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubj<62>)<29><>}<7D>(h<05>``6``<60>h]<5D>h<16>6<><36><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hhhNhNubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+h<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j8hj5ubj9)<29><>}<7D>(hhh]<5D>h<EFBFBD>)<29><>}<7D>(h<05><>mlp_dims (intint, optionaloptional) -- The hidden dimension of the MLP block in each
Transformer layer. Defaults to 4*dims if not provided. Default:
None.<2E>h]<5D>(jB)<29><>}<7D>(h<05>mlp_dims<6D>h]<5D>h<16>mlp_dims<6D><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jAhj<>ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>int<6E>h]<5D>ja)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhj<>ubj[)<29><>}<7D>(h<05>int<6E>h]<5D>ja)<29><>}<7D>(h<05>int<6E>h]<5D>h<16>int<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>j<EFBFBD>j<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhj<>ubja)<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhj<>ubj[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hjubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>jj<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhj<>ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16>MThe hidden dimension of the MLP block in each
Transformer layer. Defaults to <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubj<62>)<29><>}<7D>(h<05>
``4*dims``<60>h]<5D>h<16>4*dims<6D><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj7hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hhhNhNubh<16> if not provided. Default:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubj<62>)<29><>}<7D>(h<05>``None``<60>h]<5D>h<16>None<6E><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjIhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hhhNhNubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+h<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j8hj5ubj9)<29><>}<7D>(hhh]<5D>h<EFBFBD>)<29><>}<7D>(h<05><>dropout (floatfloat, optionaloptional) -- The dropout value for the Transformer
encoder and decoder. Dropout is used after each attention layer and
the activation in the MLP layer. Default: 0.0.<2E>h]<5D>(jB)<29><>}<7D>(h<05>dropout<75>h]<5D>h<16>dropout<75><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjnhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jAhjjubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjjhhhNhNubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>float<61>h]<5D>ja)<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhj<>ubj[)<29><>}<7D>(h<05>float<61>h]<5D>ja)<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>j<EFBFBD>j<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhjjubja)<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hjjubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhj<>ubj[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>j<EFBFBD>j<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhjjubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjjhhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjjhhhNhNubh<16><>The dropout value for the Transformer
encoder and decoder. Dropout is used after each attention layer and
the activation in the MLP layer. Default: <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjjhhhNhNubj<62>)<29><>}<7D>(h<05>``0.0``<60>h]<5D>h<16>0.0<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hjjhhhNhNubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjjhhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+h<>hjgubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j8hj5ubj9)<29><>}<7D>(hhh]<5D>h<EFBFBD>)<29><>}<7D>(h<05>|activation (functionfunction, optionaloptional) -- the activation function for the MLP
hidden layer. Default: mlx.nn.relu().<2E>h]<5D>(jB)<29><>}<7D>(h<05>
activation<EFBFBD>h]<5D>h<16>
activation<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj=hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jAhj9ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj9hhhNhNubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>function<6F>h]<5D>ja)<29><>}<7D>(h<05>function<6F>h]<5D>h<16>function<6F><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjVhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hjRubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhjOubj[)<29><>}<7D>(h<05>function<6F>h]<5D>ja)<29><>}<7D>(h<05>function<6F>h]<5D>h<16>function<6F><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjohhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hjkubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhjOubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>jqj<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhj9ubja)<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj9ubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhj<>ubj[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>j<EFBFBD>j<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhj9ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj9hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj9hhhNhNubh<16>;the activation function for the MLP
hidden layer. Default: <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj9hhhNhNubjV)<29><>}<7D>(h<05>:func:`mlx.nn.relu`<60>h]<5D>j<EFBFBD>)<29><>}<7D>(hj<>h]<5D>h<16> mlx.nn.relu()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>(<28>xref<65><66>py<70><79>py-func<6E>eh%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><>refdoc<6F><63>)python/nn/_autosummary/mlx.nn.Transformer<65><72> refdomain<69>j<EFBFBD><00>reftype<70><65>func<6E><63> refexplicit<69><74><EFBFBD>refwarn<72><6E>j<EFBFBD>h<>j<EFBFBD>h<><68> reftarget<65><74> mlx.nn.relu<6C>uh+jUhh<>hKhj9hhubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj9hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+h<>hj6ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j8hj5ubj9)<29><>}<7D>(hhh]<5D>h<EFBFBD>)<29><>}<7D>(h<05><>custom_encoder (Modulenn.Module, optionaloptional) -- A custom encoder to replace the
standard Transformer encoder. Default: None.<2E>h]<5D>(jB)<29><>}<7D>(h<05>custom_encoder<65>h]<5D>h<16>custom_encoder<65><72><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj!hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jAhjubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>Module<6C>h]<5D>ja)<29><>}<7D>(h<05>Module<6C>h]<5D>h<16>Module<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj:hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj6ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhj3ubj[)<29><>}<7D>(h<05> nn.Module<6C>h]<5D>ja)<29><>}<7D>(h<05> nn.Module<6C>h]<5D>h<16> nn.Module<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjShhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hjOubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhj3ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>jUj<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhjubja)<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjrhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hjubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhj<>ubj[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>j<EFBFBD>j<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhjubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubh<16>GA custom encoder to replace the
standard Transformer encoder. Default: <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubj<62>)<29><>}<7D>(h<05>``None``<60>h]<5D>h<16>None<6E><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hjhhhNhNubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+h<>hjubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j8hj5ubj9)<29><>}<7D>(hhh]<5D>h<EFBFBD>)<29><>}<7D>(h<05><>custom_decoder (Modulenn.Module, optionaloptional) -- A custom decoder to replace the
standard Transformer decoder. Default: None.<2E>h]<5D>(jB)<29><>}<7D>(h<05>custom_decoder<65>h]<5D>h<16>custom_decoder<65><72><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jAhj<>ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>Module<6C>h]<5D>ja)<29><>}<7D>(h<05>Module<6C>h]<5D>h<16>Module<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hjubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhjubj[)<29><>}<7D>(h<05> nn.Module<6C>h]<5D>ja)<29><>}<7D>(h<05> nn.Module<6C>h]<5D>h<16> nn.Module<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj"hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hjubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhjubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>j$j<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhj<>ubja)<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjAhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjVhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hjRubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhjOubj[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjohhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hjkubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhjOubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>jqj<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhj<>ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16>GA custom decoder to replace the
standard Transformer decoder. Default: <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubj<62>)<29><>}<7D>(h<05>``None``<60>h]<5D>h<16>None<6E><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hhhNhNubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+h<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j8hj5ubj9)<29><>}<7D>(hhh]<5D>h<EFBFBD>)<29><>}<7D>(h<05><>norm_first (boolbool, optionaloptional) -- if True, encoder and decoder layers
will perform layer normalization before attention and MLP
operations, otherwise after. Default: False.<2E>h]<5D>(jB)<29><>}<7D>(h<05>
norm_first<EFBFBD>h]<5D>h<16>
norm_first<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jAhj<>ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>bool<6F>h]<5D>ja)<29><>}<7D>(h<05>bool<6F>h]<5D>h<16>bool<6F><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhj<>ubj[)<29><>}<7D>(h<05>bool<6F>h]<5D>ja)<29><>}<7D>(h<05>bool<6F>h]<5D>h<16>bool<6F><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>j<EFBFBD>j<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhj<>ubja)<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj<>ubjV)<29><>}<7D>(hhh]<5D>(j[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj% hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj! ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>jwuh+jZhj ubj[)<29><>}<7D>(h<05>optional<61>h]<5D>ja)<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj> hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j`hj: ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>j<EFBFBD>uh+jZhj ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j<EFBFBD><00> refexplicit<69><74><EFBFBD>reftype<70>h<EFBFBD><68> reftarget<65>j@ j<><00>j<EFBFBD>h<>j<EFBFBD>h<>uh+jUhj<>ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16>if <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubj<62>)<29><>}<7D>(h<05>``True``<60>h]<5D>h<16>True<75><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hji hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hhhNhNubh<16>}, encoder and decoder layers
will perform layer normalization before attention and MLP
operations, otherwise after. Default: <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubj<62>)<29><>}<7D>(h<05> ``False``<60>h]<5D>h<16>False<73><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj{ hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hhhNhNubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+h<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j8hj5ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j3hj0ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j.hjubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhh<>hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+h<>hh@hhhhlhNubeh}<7D>(h!]<5D>h#]<5D>(j<><00>class<73>eh%]<5D>h']<5D>h)]<5D><>domain<69>j<EFBFBD><00>objtype<70>j<EFBFBD> <00>desctype<70>j<EFBFBD> <00>noindex<65><78>uh+h>hhhh hNhNubeh}<7D>(h!]<5D><>mlx-nn-transformer<65>ah#]<5D>h%]<5D><>mlx.nn.transformer<65>ah']<5D>h)]<5D>uh+h
hhhhhh,hKubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><>source<63>h,uh+h<01>current_source<63>N<EFBFBD> current_line<6E>N<EFBFBD>settings<67><73>docutils.frontend<6E><64>Values<65><73><EFBFBD>)<29><>}<7D>(hN<> generator<6F>N<EFBFBD> datestamp<6D>N<EFBFBD> source_link<6E>N<EFBFBD>
source_url<EFBFBD>N<EFBFBD> toc_backlinks<6B><73>entry<72><79>footnote_backlinks<6B>K<01> sectnum_xform<72>K<01>strip_comments<74>N<EFBFBD>strip_elements_with_classes<65>N<EFBFBD> strip_classes<65>N<EFBFBD> report_level<65>K<02>
halt_level<EFBFBD>K<05>exit_status_level<65>K<05>debug<75>N<EFBFBD>warning_stream<61>N<EFBFBD> traceback<63><6B><EFBFBD>input_encoding<6E><67> utf-8-sig<69><67>input_encoding_error_handler<65><72>strict<63><74>output_encoding<6E><67>utf-8<><38>output_encoding_error_handler<65>j<EFBFBD> <00>error_encoding<6E><67>utf-8<><38>error_encoding_error_handler<65><72>backslashreplace<63><65> language_code<64><65>en<65><6E>record_dependencies<65>N<EFBFBD>config<69>N<EFBFBD> id_prefix<69>h<06>auto_id_prefix<69><78>id<69><64> dump_settings<67>N<EFBFBD>dump_internals<6C>N<EFBFBD>dump_transforms<6D>N<EFBFBD>dump_pseudo_xml<6D>N<EFBFBD>expose_internals<6C>N<EFBFBD>strict_visitor<6F>N<EFBFBD>_disable_config<69>N<EFBFBD>_source<63>h,<2C> _destination<6F>N<EFBFBD> _config_files<65>]<5D><>file_insertion_enabled<65><64><EFBFBD> raw_enabled<65>K<01>line_length_limit<69>M'<27>pep_references<65>N<EFBFBD> pep_base_url<72><6C>https://peps.python.org/<2F><>pep_file_url_template<74><65>pep-%04d<34><64>rfc_references<65>N<EFBFBD> rfc_base_url<72><6C>&https://datatracker.ietf.org/doc/html/<2F><> tab_width<74>K<08>trim_footnote_reference_space<63><65><EFBFBD>syntax_highlight<68><74>long<6E><67> smart_quotes<65><73><EFBFBD>smartquotes_locales<65>]<5D><>character_level_inline_markup<75><70><EFBFBD>doctitle_xform<72><6D><EFBFBD> docinfo_xform<72>K<01>sectsubtitle_xform<72><6D><EFBFBD> image_loading<6E><67>link<6E><6B>embed_stylesheet<65><74><EFBFBD>cloak_email_addresses<65><73><EFBFBD>section_self_link<6E><6B><EFBFBD>env<6E>Nub<75>reporter<65>N<EFBFBD>indirect_targets<74>]<5D><>substitution_defs<66>}<7D><>substitution_names<65>}<7D><>refnames<65>}<7D><>refids<64>}<7D><>nameids<64>}<7D>(j<> j<> h<>h<EFBFBD>u<EFBFBD> nametypes<65>}<7D>(j<> <00>h<EFBFBD><68>uh!}<7D>(j<> h h<hEh<45>h<EFBFBD>u<EFBFBD> footnote_refs<66>}<7D><> citation_refs<66>}<7D><> autofootnotes<65>]<5D><>autofootnote_refs<66>]<5D><>symbol_footnotes<65>]<5D><>symbol_footnote_refs<66>]<5D><> footnotes<65>]<5D><> citations<6E>]<5D><>autofootnote_start<72>K<01>symbol_footnote_start<72>K<00>
id_counter<EFBFBD><EFBFBD> collections<6E><73>Counter<65><72><EFBFBD>}<7D><><EFBFBD>R<EFBFBD><52>parse_messages<65>]<5D><>transform_messages<65>]<5D><> transformer<65>N<EFBFBD> include_log<6F>]<5D><>
decoration<EFBFBD>Nhhub.