Files
mlx/docs/build/doctrees/python/_autosummary/mlx.optimizers.AdamW.doctree

56 lines
23 KiB
Plaintext
Raw Normal View History

2024-01-17 17:15:29 -08:00
<EFBFBD><05>GZ<00>sphinx.addnodes<65><73>document<6E><74><EFBFBD>)<29><>}<7D>(<28> rawsource<63><65><00><>children<65>]<5D><>docutils.nodes<65><73>section<6F><6E><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09>title<6C><65><EFBFBD>)<29><>}<7D>(h<05>mlx.optimizers.AdamW<6D>h]<5D>h <09>Text<78><74><EFBFBD><EFBFBD>mlx.optimizers.AdamW<6D><57><EFBFBD><EFBFBD><EFBFBD>}<7D>(<28>parent<6E>h<11> _document<6E>h<03>source<63>N<EFBFBD>line<6E>Nuba<62>
attributes<EFBFBD>}<7D>(<28>ids<64>]<5D><>classes<65>]<5D><>names<65>]<5D><>dupnames<65>]<5D><>backrefs<66>]<5D>u<EFBFBD>tagname<6D>hhh hhh<1D>Q/Users/awnihannun/repos/mlx/docs/src/python/_autosummary/mlx.optimizers.AdamW.rst<73>hKubh<00>index<65><78><EFBFBD>)<29><>}<7D>(hhh]<5D>h}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><>entries<65>]<5D>(<28>single<6C><65>AdamW (class in mlx.optimizers)<29><>mlx.optimizers.AdamW<6D>hNt<4E>auh+h-hh hhhNhNubh<00>desc<73><63><EFBFBD>)<29><>}<7D>(hhh]<5D>(h<00>desc_signature<72><65><EFBFBD>)<29><>}<7D>(h<05>vAdamW(learning_rate: float, betas: ~typing.List[float] = [0.9, 0.999], eps: float = 1e-08, weight_decay: float = 0.01)<29>h]<5D>(h<00>desc_annotation<6F><6E><EFBFBD>)<29><>}<7D>(h<05>2[<#text: 'class'>, <desc_sig_space: <#text: ' '>>]<5D>h]<5D>(h<16>class<73><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhKhhhNhNubh<00>desc_sig_space<63><65><EFBFBD>)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhUhhhNhNubah}<7D>(h!]<5D>h#]<5D><>w<>ah%]<5D>h']<5D>h)]<5D>uh+hShhKubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> xml:space<63><65>preserve<76>uh+hIhhEhhh<1D>V/Users/awnihannun/repos/mlx/python/mlx/optimizers.py:docstring of mlx.optimizers.AdamW<6D>hNubh<00> desc_addname<6D><65><EFBFBD>)<29><>}<7D>(h<05>mlx.optimizers.<2E>h]<5D>h<16>mlx.optimizers.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhohhhNhNubah}<7D>(h!]<5D>h#]<5D>(<28> sig-prename<6D><65> descclassname<6D>eh%]<5D>h']<5D>h)]<5D>hjhkuh+hmhhEhhhhlhNubh<00> desc_name<6D><65><EFBFBD>)<29><>}<7D>(h<05>AdamW<6D>h]<5D>h<16>AdamW<6D><57><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>(<28>sig-name<6D><65>descname<6D>eh%]<5D>h']<5D>h)]<5D>hjhkuh+hhhEhhhhlhNubh<00>desc_parameterlist<73><74><EFBFBD>)<29><>}<7D>(h<05>olearning_rate: float, betas: ~typing.List[float] = [0.9, 0.999], eps: float = 1e-08, weight_decay: float = 0.01<EFBFBD>h]<5D>(h<00>desc_parameter<65><72><EFBFBD>)<29><>}<7D>(h<05>learning_rate: floatfloat<61>h]<5D>(h<00> desc_sig_name<6D><65><EFBFBD>)<29><>}<7D>(h<05> learning_rate<74>h]<5D>h<16> learning_rate<74><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubah}<7D>(h!]<5D>h#]<5D><>n<>ah%]<5D>h']<5D>h)]<5D>uh+h<>hh<>ubh<00>desc_sig_punctuation<6F><6E><EFBFBD>)<29><>}<7D>(h<05>:<3A>h]<5D>h<16>:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubah}<7D>(h!]<5D>h#]<5D><>p<>ah%]<5D>h']<5D>h)]<5D>uh+h<>hh<>ubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShh<>ubh<62>)<29><>}<7D>(h<05>
floatfloat<EFBFBD>h]<5D>h<00> pending_xref<65><66><EFBFBD>)<29><>}<7D>(hhh]<5D>(h<00>pending_xref_condition<6F><6E><EFBFBD>)<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F><6E>resolved<65>uh+h<>hh<>ubh<62>)<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F><6E>*<2A>uh+h<>hh<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69><6E>py<70><79>reftype<70><65>class<73><73> reftarget<65><74>float<61><74> refspecific<69><63><EFBFBD> py:module<6C><65>mlx.optimizers<72><73>py:class<73>Nuh+h<>hh<>ubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hh<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>hjhkuh+h<>hh<>ubh<62>)<29><>}<7D>(h<05>*betas: ListList[floatfloat] = [0.9, 0.999]<5D>h]<5D>(h<>)<29><>}<7D>(h<05>betas<61>h]<5D>h<16>betas<61><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hjubh<62>)<29><>}<7D>(h<05>:<3A>h]<5D>h<16>:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj&hhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hjubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj4hhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShjubh<62>)<29><>}<7D>(h<05>ListList[floatfloat]<5D>h]<5D>(h<>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>List<73>h]<5D>h<16>List<73><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjIhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hjFubh<62>)<29><>}<7D>(h<05>List<73>h]<5D>h<16>List<73><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjXhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hjFubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>h<EFBFBD><68>reftype<70><65>obj<62><6A> reftarget<65><74> typing.List<73><74> refspecific<69><63><EFBFBD> py:module<6C>j<00>py:class<73>Nuh+h<>hjBubh<62>)<29><>}<7D>(h<05>[<5B>h]<5D>h<16>[<5B><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjuhhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hjBubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubh<62>)<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>h<EFBFBD><68>reftype<70>j<00> reftarget<65><74>float<61><74> refspecific<69><63><EFBFBD> py:module<6C>j<00>py:class<73>Nuh+h<>hjBubh<62>)<29><>}<7D>(h<05>]<5D>h]<5D>h<16>]<5D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hjBubeh}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hjubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShjubh<00>desc_sig_operator<6F><72><EFBFBD>)<29><>}<7D>(h<05>=<3D>h]<5D>h<16>=<3D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D><>o<>ah%]<5D>h']<5D>h)]<5D>uh+j<>hjubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShjubh <09>inline<6E><65><EFBFBD>)<29><>}<7D>(h<05> [0.9, 0.999]<5D>h]<5D>h<16> [0.9, 0.999]<5D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D><> default_value<75>ah%]<5D>h']<5D>h)]<5D><>support_smartquotes<65><73>uh+j<>hjubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>hjhkuh+h<>hh<>ubh<62>)<29><>}<7D>(h<05>eps: floatfloat = 1e-08<30>h]<5D>(h<>)<29><>}<7D>(h<05>eps<70>h]<5D>h<16>eps<70><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hj
ubh<62>)<29><>}<7D>(h<05>:<3A>h]<5D>h<16>:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hj
ubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj*hhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShj
ubh<62>)<29><>}<7D>(h<05>
floatfloat<EFBFBD>h]<5D>h<EFBFBD>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj?hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<ubh<62>)<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjNhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>h<EFBFBD><68>reftype<70>j<00> reftarget<65><74>float<61><74> refspecific<69><63><EFBFBD> py:module<6C>j<00>py:class<73>Nuh+h<>hj8ubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hj
ubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjphhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShj
ubj<62>)<29><>}<7D>(h<05>=<3D>h]<5D>h<16>=<3D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj~hhhNhNubah}<7D>(h!]<5D>h#]<5D>j<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+j<>hj
ubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShj
ubj<62>)<29><>}<7D>(h<05>1e-08<30>h]<5D>h<16>1e-08<30><38><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>j<EFBFBD>ah%]<5D>h']<5D>h)]<5D><>support_smartquotes<65><73>uh+j<>hj
ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>hjhkuh+h<>hh<>ubh<62>)<29><>}<7D>(h<05>weight_decay: floatfloat = 0.01<EFBFBD>h]<5D>(h<>)<29><>}<7D>(h<05> weight_decay<61>h]<5D>h<16> weight_decay<61><79><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hj<>ubh<62>)<29><>}<7D>(h<05>:<3A>h]<5D>h<16>:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hj<>ubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShj<>ubh<62>)<29><>}<7D>(h<05>
floatfloat<EFBFBD>h]<5D>h<EFBFBD>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubh<62>)<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>h<EFBFBD><68>reftype<70>j<00> reftarget<65><74>float<61><74> refspecific<69><63><EFBFBD> py:module<6C>j<00>py:class<73>Nuh+h<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+h<>hj<>ubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShj<>ubj<62>)<29><>}<7D>(h<05>=<3D>h]<5D>h<16>=<3D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj#hhhNhNubah}<7D>(h!]<5D>h#]<5D>j<EFBFBD>ah%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubhT)<29><>}<7D>(h<05> <20>h]<5D>h<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj1hhhNhNubah}<7D>(h!]<5D>h#]<5D>h`ah%]<5D>h']<5D>h)]<5D>uh+hShj<>ubj<62>)<29><>}<7D>(h<05>0.01<EFBFBD>h]<5D>h<16>0.01<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj?hhhNhNubah}<7D>(h!]<5D>h#]<5D>j<EFBFBD>ah%]<5D>h']<5D>h)]<5D><>support_smartquotes<65><73>uh+j<>hj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>hjhkuh+h<>hh<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>hjhkuh+h<>hhEhhhhlhNubeh}<7D>(h!]<5D>h<ah#]<5D>(<28>sig<69><67>
sig-object<63>eh%]<5D>h']<5D>h)]<5D><>module<6C><65>mlx.optimizers<72>jh<06>fullname<6D>h<EFBFBD>uh+hChhlhKhh@hhubh<00> desc_content<6E><74><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09> paragraph<70><68><EFBFBD>)<29><>}<7D>(h<05>*Implementation of the AdamW optimizer [1].<2E>h]<5D>h<16>*Implementation of the AdamW optimizer [1].<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjlhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jjh<1D>V/Users/awnihannun/repos/mlx/python/mlx/optimizers.py:docstring of mlx.optimizers.AdamW<6D>hKhjghhubjk)<29><>}<7D>(h<05><>Following the above convention, in contrast with [1], we do not use bias
correction in the first and second moments for AdamW. We update the weights
with a weight_decay (:math:`\lambda`) value:<3A>h]<5D>(h<16><>Following the above convention, in contrast with [1], we do not use bias
correction in the first and second moments for AdamW. We update the weights
with a weight_decay (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj{hhhNhNubh <09>math<74><68><EFBFBD>)<29><>}<7D>(h<05>:math:`\lambda`<60>h]<5D>h<16>\lambda<64><61><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj{ubh<16>) value:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj{hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jjhjzhKhjghhubjk)<29><>}<7D>(h<05>[[1]: Loshchilov, I. and Hutter, F., 2019. Decoupled weight decay
regularization. ICLR 2019.<2E>h]<5D>h<16>[[1]: Loshchilov, I. and Hutter, F., 2019. Decoupled weight decay
regularization. ICLR 2019.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jjhjzhKhjghhubh <09>
math_block<EFBFBD><EFBFBD><EFBFBD>)<29><>}<7D>(h<05><>m_{t+1} &= \beta_1 m_t + (1 - \beta_1) g_t \\
v_{t+1} &= \beta_2 v_t + (1 - \beta_2) g_t^2 \\
w_{t+1} &= w_t - \alpha (\frac{m_{t+1}}{\sqrt{v_{t+1} + \epsilon}} + \lambda w_t)<29>h]<5D>h<16><>m_{t+1} &= \beta_1 m_t + (1 - \beta_1) g_t \\
v_{t+1} &= \beta_2 v_t + (1 - \beta_2) g_t^2 \\
w_{t+1} &= w_t - \alpha (\frac{m_{t+1}}{\sqrt{v_{t+1} + \epsilon}} + \lambda w_t)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>hj<>sbah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><>docname<6D><65>(python/_autosummary/mlx.optimizers.AdamW<6D><57>number<65>N<EFBFBD>label<65>N<EFBFBD>nowrap<61><70>hjhkuh+j<>hjzhK
hjghhubh <09>
field_list<EFBFBD><EFBFBD><EFBFBD>)<29><>}<7D>(hhh]<5D>h <09>field<6C><64><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09>
field_name<EFBFBD><EFBFBD><EFBFBD>)<29><>}<7D>(h<05>
Parameters<EFBFBD>h]<5D>h<16>
Parameters<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hhlhKubh <09>
field_body<EFBFBD><EFBFBD><EFBFBD>)<29><>}<7D>(hhh]<5D>h <09> bullet_list<73><74><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09> list_item<65><6D><EFBFBD>)<29><>}<7D>(hhh]<5D>jk)<29><>}<7D>(h<05>7learning_rate (floatfloat) -- The learning rate \alpha.<2E>h]<5D>(h<00>literal_strong<6E><67><EFBFBD>)<29><>}<7D>(h<05> learning_rate<74>h]<5D>h<16> learning_rate<74><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>float<61>h]<5D>h<00>literal_emphasis<69><73><EFBFBD>)<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj
hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hjubh<62>)<29><>}<7D>(h<05>float<61>h]<5D>j )<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj#hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hjubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69><6E>py<70><79> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j%<00> refspecific<69><63><EFBFBD> py:module<6C>jc<00>py:class<73>h<EFBFBD>uh+h<>hj<>ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16>The learning rate <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubj<62>)<29><>}<7D>(h<05>:math:`\alpha`<60>h]<5D>h<16>\alpha<68><61><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjRhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hhhNhNubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jjhj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubj<62>)<29><>}<7D>(hhh]<5D>jk)<29><>}<7D>(h<05><>betas (TupleTuple[floatfloat, floatfloat], optionaloptional) -- The coefficients
(\beta_1, \beta_2) used for computing running averages of the
gradient and its square. Default: (0.9, 0.999)<29>h]<5D>(j<>)<29><>}<7D>(h<05>betas<61>h]<5D>h<16>betas<61><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjwhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hjsubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjshhhNhNubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>Tuple<6C>h]<5D>j )<29><>}<7D>(h<05>Tuple<6C>h]<5D>h<16>Tuple<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubh<62>)<29><>}<7D>(h<05>Tuple<6C>h]<5D>j )<29><>}<7D>(h<05>Tuple<6C>h]<5D>h<16>Tuple<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j?<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j<EFBFBD>jC<00>jDjcjEh<>uh+h<>hjsubj )<29><>}<7D>(h<05>[<5B>h]<5D>h<16>[<5B><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjsubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>float<61>h]<5D>j )<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubh<62>)<29><>}<7D>(h<05>float<61>h]<5D>j )<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j?<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j<EFBFBD>jC<00>jDjcjEh<>uh+h<>hjsubj )<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjsubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>float<61>h]<5D>j )<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj*hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj&ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj#ubh<62>)<29><>}<7D>(h<05>float<61>h]<5D>j )<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjChhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj?ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj#ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j?<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>jEjC<00>jDjcjEh<>uh+h<>hjsubj )<29><>}<7D>(h<05>]<5D>h]<5D>h<16>]<5D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjbhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjsubj )<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjphhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjsubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>optional<61>h]<5D>j )<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj~ubh<62>)<29><>}<7D>(h<05>optional<61>h]<5D>j )<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj~ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j?<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j<EFBFBD>jC<00>jDjcjEh<>uh+h<>hjsubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjshhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjshhhNhNubh<16>The coefficients
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjshhhNhNubj<62>)<29><>}<7D>(h<05>:math:`(\beta_1, \beta_2)`<60>h]<5D>h<16>(\beta_1, \beta_2)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hjshhhNhNubh<16>N used for computing running averages of the
gradient and its square. Default: <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjshhhNhNubh <09>literal<61><6C><EFBFBD>)<29><>}<7D>(h<05>``(0.9, 0.999)``<60>h]<5D>h<16> (0.9, 0.999)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hjshhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jjhjpubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubj<62>)<29><>}<7D>(hhh]<5D>jk)<29><>}<7D>(h<05>~eps (floatfloat, optionaloptional) -- The term \epsilon added to the
denominator to improve numerical stability. Default: 1e-8<>h]<5D>(j<>)<29><>}<7D>(h<05>eps<70>h]<5D>h<16>eps<70><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>float<61>h]<5D>j )<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hjubh<62>)<29><>}<7D>(h<05>float<61>h]<5D>j )<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj0hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj,ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hjubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j?<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j2jC<00>jDjcjEh<>uh+h<>hj<>ubj )<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjOhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj<>ubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>optional<61>h]<5D>j )<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjdhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj`ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj]ubh<62>)<29><>}<7D>(h<05>optional<61>h]<5D>j )<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj}hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjyubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj]ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j?<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>jjC<00>jDjcjEh<>uh+h<>hj<>ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16> The term <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubj<62>)<29><>}<7D>(h<05>:math:`\epsilon`<60>h]<5D>h<16>\epsilon<6F><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hhhNhNubh<16>C added to the
denominator to improve numerical stability. Default: <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubj<62>)<29><>}<7D>(h<05>``1e-8``<60>h]<5D>h<16>1e-8<><38><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jjhj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubj<62>)<29><>}<7D>(hhh]<5D>jk)<29><>}<7D>(h<05>Tweight_decay (floatfloat, optionaloptional) -- The weight decay \lambda.
Default: 0.<2E>h]<5D>(j<>)<29><>}<7D>(h<05> weight_decay<61>h]<5D>h<16> weight_decay<61><79><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubh<16> (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>float<61>h]<5D>j )<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubh<62>)<29><>}<7D>(h<05>float<61>h]<5D>j )<29><>}<7D>(h<05>float<61>h]<5D>h<16>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j?<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>jjC<00>jDjcjEh<>uh+h<>hj<>ubj )<29><>}<7D>(h<05>, <20>h]<5D>h<16>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj,hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj<>ubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>optional<61>h]<5D>j )<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjAhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj=ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj:ubh<62>)<29><>}<7D>(h<05>optional<61>h]<5D>j )<29><>}<7D>(h<05>optional<61>h]<5D>h<16>optional<61><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjZhhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjVubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> condition<6F>h<EFBFBD>uh+h<>hj:ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><> refdomain<69>j?<00> refexplicit<69><74><EFBFBD>reftype<70>j<00> reftarget<65>j\jC<00>jDjcjEh<>uh+h<>hj<>ubh<16>)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16> <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubh<16>The weight decay <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubj<62>)<29><>}<7D>(h<05>:math:`\lambda`<60>h]<5D>h<16>\lambda<64><61><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hhhNhNubh<16> .
Default: <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubj<62>)<29><>}<7D>(h<05>``0``<60>h]<5D>h<16>0<><30><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>hhhNhNubh<16>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jjhj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hjghhhNhNubh <09>rubric<69><63><EFBFBD>)<29><>}<7D>(h<05>Methods<64>h]<5D>h<16>Methods<64><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+j<>hjghhhh,hK ubh<00>tabular_col_spec<65><63><EFBFBD>)<29><>}<7D>(hhh]<5D>h}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><>spec<65><63>\X{1}{2}\X{1}{2}<7D>uh+j<>hjghhh<1D>b/Users/awnihannun/repos/mlx/docs/src/python/_autosummary/mlx.optimizers.AdamW.rst:16:<autosummary><3E>hNub<75>sphinx.ext.autosummary<72><79>autosummary_table<6C><65><EFBFBD>)<29><>}<7D>(h<05><>
__init__(learning_rate[, betas, eps, ...])
apply_single(gradient, parameter, state)
Performs the AdamW parameter update by modifying the parameters passed into Adam.<2E>h]<5D>h <09>table<6C><65><EFBFBD>)<29><>}<7D>(hhh]<5D>h <09>tgroup<75><70><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09>colspec<65><63><EFBFBD>)<29><>}<7D>(hhh]<5D>h}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><>colwidth<74>K
uh+j<>hj<>ubj<62>)<29><>}<7D>(hhh]<5D>h}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><>colwidth<74>KZuh+j<>hj<>ubh <09>tbody<64><79><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09>row<6F><77><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09>entry<72><79><EFBFBD>)<29><>}<7D>(hhh]<5D>jk)<29><>}<7D>(h<05>^:py:obj:`__init__ <mlx.optimizers.AdamW.__init__>`\ \(learning\_rate\[\, betas\, eps\, ...\]\)<29>h]<5D>(h<>)<29><>}<7D>(h<05>2:py:obj:`__init__ <mlx.optimizers.AdamW.__init__>`<60>h]<5D>j<EFBFBD>)<29><>}<7D>(hj'h]<5D>h<16>__init__<5F><5F><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj)hhhNhNubah}<7D>(h!]<5D>h#]<5D>(<28>xref<65><66>py<70><79>py-obj<62>eh%]<5D>h']<5D>h)]<5D>uh+j<>hj%ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><>refdoc<6F>j<EFBFBD><00> refdomain<69>j4<00>reftype<70><65>obj<62><6A> refexplicit<69><74><EFBFBD>refwarn<72><6E>jDjcjEh<><68> reftarget<65><74>mlx.optimizers.AdamW.__init__<5F>uh+h<>h<1D>b/Users/awnihannun/repos/mlx/docs/src/python/_autosummary/mlx.optimizers.AdamW.rst:16:<autosummary><3E>hKhj!ubh<16>"(learning_rate[, betas, eps, ...])<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj!hhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jjhjGhKhjubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjubj)<29><>}<7D>(hhh]<5D>jk)<29><>}<7D>(hhh]<5D>h}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jjhjXubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjubj)<29><>}<7D>(hhh]<5D>(j)<29><>}<7D>(hhh]<5D>jk)<29><>}<7D>(h<05>\:py:obj:`apply_single <mlx.optimizers.AdamW.apply_single>`\ \(gradient\, parameter\, state\)<29>h]<5D>(h<>)<29><>}<7D>(h<05>::py:obj:`apply_single <mlx.optimizers.AdamW.apply_single>`<60>h]<5D>j<EFBFBD>)<29><>}<7D>(hj|h]<5D>h<16> apply_single<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj~hhhNhNubah}<7D>(h!]<5D>h#]<5D>(j3<00>py<70><79>py-obj<62>eh%]<5D>h']<5D>h)]<5D>uh+j<>hjzubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><>refdoc<6F>j<EFBFBD><00> refdomain<69>j<EFBFBD><00>reftype<70><65>obj<62><6A> refexplicit<69><74><EFBFBD>refwarn<72><6E>jDjcjEh<>jE<00>!mlx.optimizers.AdamW.apply_single<6C>uh+h<>h<1D>b/Users/awnihannun/repos/mlx/docs/src/python/_autosummary/mlx.optimizers.AdamW.rst:16:<autosummary><3E>hKhjvubh<16>(gradient, parameter, state)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjvhhhNhNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jjhj<>hKhjsubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjpubj)<29><>}<7D>(hhh]<5D>jk)<29><>}<7D>(h<05>QPerforms the AdamW parameter update by modifying the parameters passed into Adam.<2E>h]<5D>h<16>QPerforms the AdamW parameter update by modifying the parameters passed into Adam.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hhhNhNubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jjhj<>hKhj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjpubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhjubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jhj<>ubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><>cols<6C>Kuh+j<>hj<>ubah}<7D>(h!]<5D>h#]<5D><>autosummary longtable<6C>ah%]<5D>h']<5D>h)]<5D>uh+j<>hj<>ubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>hjhkuh+j<>hjghhhj<>hNubeh}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D>uh+jehh@hhhhlhNubeh}<7D>(h!]<5D>h#]<5D>(j?<00>class<73>eh%]<5D>h']<5D>h)]<5D><>domain<69>j?<00>objtype<70>j<EFBFBD><00>desctype<70>j<EFBFBD><00>noindex<65><78>uh+h>hhhh hNhNubeh}<7D>(h!]<5D><>mlx-optimizers-adamw<6D>ah#]<5D>h%]<5D><>mlx.optimizers.adamw<6D>ah']<5D>h)]<5D>uh+h
hhhhhh,hKubah}<7D>(h!]<5D>h#]<5D>h%]<5D>h']<5D>h)]<5D><>source<63>h,uh+h<01>current_source<63>N<EFBFBD> current_line<6E>N<EFBFBD>settings<67><73>docutils.frontend<6E><64>Values<65><73><EFBFBD>)<29><>}<7D>(hN<> generator<6F>N<EFBFBD> datestamp<6D>N<EFBFBD> source_link<6E>N<EFBFBD>
source_url<EFBFBD>N<EFBFBD> toc_backlinks<6B>j<00>footnote_backlinks<6B>K<01> sectnum_xform<72>K<01>strip_comments<74>N<EFBFBD>strip_elements_with_classes<65>N<EFBFBD> strip_classes<65>N<EFBFBD> report_level<65>K<02>
halt_level<EFBFBD>K<05>exit_status_level<65>K<05>debug<75>N<EFBFBD>warning_stream<61>N<EFBFBD> traceback<63><6B><EFBFBD>input_encoding<6E><67> utf-8-sig<69><67>input_encoding_error_handler<65><72>strict<63><74>output_encoding<6E><67>utf-8<><38>output_encoding_error_handler<65>j <00>error_encoding<6E><67>utf-8<><38>error_encoding_error_handler<65><72>backslashreplace<63><65> language_code<64><65>en<65><6E>record_dependencies<65>N<EFBFBD>config<69>N<EFBFBD> id_prefix<69>h<06>auto_id_prefix<69><78>id<69><64> dump_settings<67>N<EFBFBD>dump_internals<6C>N<EFBFBD>dump_transforms<6D>N<EFBFBD>dump_pseudo_xml<6D>N<EFBFBD>expose_internals<6C>N<EFBFBD>strict_visitor<6F>N<EFBFBD>_disable_config<69>N<EFBFBD>_source<63>h,<2C> _destination<6F>N<EFBFBD> _config_files<65>]<5D><>file_insertion_enabled<65><64><EFBFBD> raw_enabled<65>K<01>line_length_limit<69>M'<27>pep_references<65>N<EFBFBD> pep_base_url<72><6C>https://peps.python.org/<2F><>pep_file_url_template<74><65>pep-%04d<34><64>rfc_references<65>N<EFBFBD> rfc_base_url<72><6C>&https://datatracker.ietf.org/doc/html/<2F><> tab_width<74>K<08>trim_footnote_reference_space<63><65><EFBFBD>syntax_highlight<68><74>long<6E><67> smart_quotes<65><73><EFBFBD>smartquotes_locales<65>]<5D><>character_level_inline_markup<75><70><EFBFBD>doctitle_xform<72><6D><EFBFBD> docinfo_xform<72>K<01>sectsubtitle_xform<72><6D><EFBFBD> image_loading<6E><67>link<6E><6B>embed_stylesheet<65><74><EFBFBD>cloak_email_addresses<65><73><EFBFBD>section_self_link<6E><6B><EFBFBD>env<6E>Nub<75>reporter<65>N<EFBFBD>indirect_targets<74>]<5D><>substitution_defs<66>}<7D><>substitution_names<65>}<7D><>refnames<65>}<7D><>refids<64>}<7D><>nameids<64>}<7D>j<EFBFBD>j<>s<> nametypes<65>}<7D>j<EFBFBD><00>sh!}<7D>(j<>h h<hEu<45> footnote_refs<66>}<7D><> citation_refs<66>}<7D><> autofootnotes<65>]<5D><>autofootnote_refs<66>]<5D><>symbol_footnotes<65>]<5D><>symbol_footnote_refs<66>]<5D><> footnotes<65>]<5D><> citations<6E>]<5D><>autofootnote_start<72>K<01>symbol_footnote_start<72>K<00>
id_counter<EFBFBD><EFBFBD> collections<6E><73>Counter<65><72><EFBFBD>}<7D><><EFBFBD>R<EFBFBD><52>parse_messages<65>]<5D><>transform_messages<65>]<5D><> transformer<65>N<EFBFBD> include_log<6F>]<5D><>
decoration<EFBFBD>Nhhub.