docs/build/doctrees/python/optimizers.doctree

<EFBFBD><05>mP<00>sphinx.addnodes<65><73>document<6E><74><EFBFBD>)<29><>}<7D>(<28>	rawsource<63><65><00><>children<65>]<5D>(<28>docutils.nodes<65><73>target<65><74><EFBFBD>)<29><>}<7D>(h<05>.. _optimizers:<3A>h]<5D><>
attributes<EFBFBD>}<7D>(<28>ids<64>]<5D><>classes<65>]<5D><>names<65>]<5D><>dupnames<65>]<5D><>backrefs<66>]<5D><>refid<69><64>
optimizers<EFBFBD>u<EFBFBD>tagname<6D>h
<EFBFBD>line<6E>K<01>parent<6E>h<03>	_document<6E>h<03>source<63><65>:/Users/awnihannun/repos/mlx/docs/src/python/optimizers.rst<73>ubh	<09>section<6F><6E><EFBFBD>)<29><>}<7D>(hhh]<5D>(h	<09>title<6C><65><EFBFBD>)<29><>}<7D>(h<05>
Optimizers<EFBFBD>h]<5D>h	<09>Text<78><74><EFBFBD><EFBFBD>
Optimizers<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h+h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh)h h&h!hh"h#hKubh	<09>	paragraph<70><68><EFBFBD>)<29><>}<7D>(hXRThe optimizers in MLX can be used both with :mod:`mlx.nn` but also with pure
:mod:`mlx.core` functions. A typical example involves calling
:meth:`Optimizer.update` to update a model's parameters based on the loss
gradients and subsequently calling :func:`mlx.core.eval` to evaluate both the
model's parameters and the **optimizer state**.<2E>h]<5D>(h0<68>,The optimizers in MLX can be used both with <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h=h!hh"NhNubh<00>pending_xref<65><66><EFBFBD>)<29><>}<7D>(h<05>
:mod:`mlx.nn`<60>h]<5D>h	<09>literal<61><6C><EFBFBD>)<29><>}<7D>(hhIh]<5D>h0<68>mlx.nn<6E><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(h hMh!hh"NhNubah}<7D>(h]<5D>h]<5D>(<28>xref<65><66>py<70><79>py-mod<6F>eh]<5D>h]<5D>h]<5D>uhhKh hGubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F><63>python/optimizers<72><73>	refdomain<69>hX<68>reftype<70><65>mod<6F><64>refexplicit<69><74><EFBFBD>refwarn<72><6E><EFBFBD>	py:module<6C>N<EFBFBD>py:class<73>N<EFBFBD>	reftarget<65><74>mlx.nn<6E>uhhEh"h#hKh h=ubh0<68> but also with pure
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h=h!hh"NhNubhF)<29><>}<7D>(h<05>:mod:`mlx.core`<60>h]<5D>hL)<29><>}<7D>(hhth]<5D>h0<68>mlx.core<72><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h hvh!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-mod<6F>eh]<5D>h]<5D>h]<5D>uhhKh hrubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>h<EFBFBD><68>reftype<70><65>mod<6F><64>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjNhkNhl<68>mlx.core<72>uhhEh"h#hKh h=ubh0<68>/ functions. A typical example involves calling
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h=h!hh"NhNubhF)<29><>}<7D>(h<05>:meth:`Optimizer.update`<60>h]<5D>hL)<29><>}<7D>(hh<>h]<5D>h0<68>Optimizer.update()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-meth<74>eh]<5D>h]<5D>h]<5D>uhhKh h<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>h<EFBFBD><68>reftype<70><65>meth<74><68>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjNhkNhl<68>Optimizer.update<74>uhhEh"h#hKh h=ubh0<68>W to update a model’s parameters based on the loss
gradients and subsequently calling <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h=h!hh"NhNubhF)<29><>}<7D>(h<05>:func:`mlx.core.eval`<60>h]<5D>hL)<29><>}<7D>(hh<>h]<5D>h0<68>mlx.core.eval()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-func<6E>eh]<5D>h]<5D>h]<5D>uhhKh h<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>hȌreftype<70><65>func<6E><63>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjNhkNhl<68>
mlx.core.eval<61>uhhEh"h#hKh h=ubh0<68>3 to evaluate both the
model’s parameters and the <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h=h!hh"NhNubh	<09>strong<6E><67><EFBFBD>)<29><>}<7D>(h<05>**optimizer state**<2A>h]<5D>h0<68>optimizer state<74><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh<>h h=ubh0<68>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h=h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hKh h&h!hubh	<09>
literal_block<63><6B><EFBFBD>)<29><>}<7D>(hX<># Create a model
model = MLP(num_layers, train_images.shape[-1], hidden_dim, num_classes)
mx.eval(model.parameters())

# Create the gradient function and the optimizer
loss_and_grad_fn = nn.value_and_grad(model, loss_fn)
optimizer = optim.SGD(learning_rate=learning_rate)

for e in range(num_epochs):
    for X, y in batch_iterate(batch_size, train_images, train_labels):
        loss, grads = loss_and_grad_fn(model, X, y)

        # Update the model with the gradients. So far no computation has happened.
        optimizer.update(model, grads)

        # Compute the new parameters but also the optimizer state.
        mx.eval(model.parameters(), optimizer.state)<29>h]<5D>h0X<30># Create a model
model = MLP(num_layers, train_images.shape[-1], hidden_dim, num_classes)
mx.eval(model.parameters())

# Create the gradient function and the optimizer
loss_and_grad_fn = nn.value_and_grad(model, loss_fn)
optimizer = optim.SGD(learning_rate=learning_rate)

for e in range(num_epochs):
    for X, y in batch_iterate(batch_size, train_images, train_labels):
        loss, grads = loss_and_grad_fn(model, X, y)

        # Update the model with the gradients. So far no computation has happened.
        optimizer.update(model, grads)

        # Compute the new parameters but also the optimizer state.
        mx.eval(model.parameters(), optimizer.state)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>h h<>sbah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>	xml:space<63><65>preserve<76><65>force<63><65><EFBFBD>language<67><65>python<6F><6E>highlight_args<67>}<7D>uhh<>h"h#hKh h&h!hubh<00>tabular_col_spec<65><63><EFBFBD>)<29><>}<7D>(hhh]<5D>h}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>spec<65><63>\X{1}{2}\X{1}{2}<7D>uhjh h&h!hh"<22>K/Users/awnihannun/repos/mlx/docs/src/python/optimizers.rst:47:<autosummary><3E>hNub<75>sphinx.ext.autosummary<72><79>autosummary_table<6C><65><EFBFBD>)<29><>}<7D>(hX<>


OptimizerState

The optimizer state implements a recursively defined collections.defaultdict, namely a missing key in an optimizer state is an OptimizerState.

Optimizer()

The base class for all optimizers.

SGD(learning_rate[, momentum, weight_decay, ...])

Stochastic gradient descent optimizer.

RMSprop(learning_rate[, alpha, eps])

Implementation of the RMSprop optimizer [1].

Adagrad(learning_rate[, eps])

Implementation of the Adagrad optimizer [1].

AdaDelta(learning_rate[, rho, eps])

Implementation of the AdaDelta optimizer with learning rate[1].

Adam(learning_rate[, betas, eps])

Implementation of the Adam optimizer [1].

AdamW(learning_rate[, betas, eps, weight_decay])

Implementation of the AdamW optimizer [1].

Adamax(learning_rate[, betas, eps])

Implementation of the Adamax optimizer.

Lion(learning_rate[, betas, weight_decay])

Implementation of the Lion optimizer [1].<2E>h]<5D>h	<09>table<6C><65><EFBFBD>)<29><>}<7D>(hhh]<5D>h	<09>tgroup<75><70><EFBFBD>)<29><>}<7D>(hhh]<5D>(h	<09>colspec<65><63><EFBFBD>)<29><>}<7D>(hhh]<5D>h}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>colwidth<74>K
uhj.h j+ubj/)<29><>}<7D>(hhh]<5D>h}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>colwidth<74>KZuhj.h j+ubh	<09>tbody<64><79><EFBFBD>)<29><>}<7D>(hhh]<5D>(h	<09>row<6F><77><EFBFBD>)<29><>}<7D>(hhh]<5D>(h	<09>entry<72><79><EFBFBD>)<29><>}<7D>(hhh]<5D>h<)<29><>}<7D>(h<05>9:py:obj:`OptimizerState <mlx.optimizers.OptimizerState>`\<5C>h]<5D>(hF)<29><>}<7D>(h<05>8:py:obj:`OptimizerState <mlx.optimizers.OptimizerState>`<60>h]<5D>hL)<29><>}<7D>(hjYh]<5D>h0<68>OptimizerState<74><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j[h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-obj<62>eh]<5D>h]<5D>h]<5D>uhhKh jWubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>je<00>reftype<70><65>obj<62><6A>refexplicit<69><74><EFBFBD>refwarn<72><6E>hj<68>mlx.optimizers<72>hkNhl<68>mlx.optimizers.OptimizerState<74>uhhEh"<22>K/Users/awnihannun/repos/mlx/docs/src/python/optimizers.rst:47:<autosummary><3E>hKh jSubh0h<06><><EFBFBD><EFBFBD>}<7D>(h jSh!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"jxhKh jPubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjNh jKubjO)<29><>}<7D>(hhh]<5D>h<)<29><>}<7D>(h<05><>The optimizer state implements a recursively defined :class:`collections.defaultdict`, namely a missing key in an optimizer state is an :class:`OptimizerState`.<2E>h]<5D>(h0<68>5The optimizer state implements a recursively defined <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhF)<29><>}<7D>(h<05> :class:`collections.defaultdict`<60>h]<5D>hL)<29><>}<7D>(hj<>h]<5D>h0<68>collections.defaultdict<63><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-class<73>eh]<5D>h]<5D>h]<5D>uhhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<EFBFBD><00>reftype<70><65>class<73><73>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjjvhkNhl<68>collections.defaultdict<63>uhhEh"<22>K/Users/awnihannun/repos/mlx/docs/src/python/optimizers.rst:47:<autosummary><3E>hKh j<>ubh0<68>3, namely a missing key in an optimizer state is an <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhF)<29><>}<7D>(h<05>:class:`OptimizerState`<60>h]<5D>hL)<29><>}<7D>(hj<>h]<5D>h0<68>OptimizerState<74><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-class<73>eh]<5D>h]<5D>h]<5D>uhhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<EFBFBD><00>reftype<70><65>class<73><73>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjjvhkNhl<68>OptimizerState<74>uhhEh"j<>hKh j<>ubh0<68>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"j<>hKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjNh jKubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjIh jFubjJ)<29><>}<7D>(hhh]<5D>(jO)<29><>}<7D>(hhh]<5D>h<)<29><>}<7D>(h<05>4:py:obj:`Optimizer <mlx.optimizers.Optimizer>`\ \(\)<29>h]<5D>(hF)<29><>}<7D>(h<05>.:py:obj:`Optimizer <mlx.optimizers.Optimizer>`<60>h]<5D>hL)<29><>}<7D>(hj<>h]<5D>h0<68>	Optimizer<65><72><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-obj<62>eh]<5D>h]<5D>h]<5D>uhhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<00>reftype<70><65>obj<62><6A>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjjvhkNhl<68>mlx.optimizers.Optimizer<65>uhhEh"<22>K/Users/awnihannun/repos/mlx/docs/src/python/optimizers.rst:47:<autosummary><3E>hKh j<>ubh0<68>()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"jhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjNh j<>ubjO)<29><>}<7D>(hhh]<5D>h<)<29><>}<7D>(h<05>"The base class for all optimizers.<2E>h]<5D>h0<68>"The base class for all optimizers.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j,h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"<22>K/Users/awnihannun/repos/mlx/docs/src/python/optimizers.rst:47:<autosummary><3E>hKh j)ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjNh j<>ubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjIh jFubjJ)<29><>}<7D>(hhh]<5D>(jO)<29><>}<7D>(hhh]<5D>h<)<29><>}<7D>(h<05>[:py:obj:`SGD <mlx.optimizers.SGD>`\ \(learning\_rate\[\, momentum\, weight\_decay\, ...\]\)<29>h]<5D>(hF)<29><>}<7D>(h<05>":py:obj:`SGD <mlx.optimizers.SGD>`<60>h]<5D>hL)<29><>}<7D>(hjSh]<5D>h0<68>SGD<47><44><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jUh!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-obj<62>eh]<5D>h]<5D>h]<5D>uhhKh jQubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j_<00>reftype<70><65>obj<62><6A>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjjvhkNhl<68>mlx.optimizers.SGD<47>uhhEh"<22>K/Users/awnihannun/repos/mlx/docs/src/python/optimizers.rst:47:<autosummary><3E>hKh jMubh0<68>.(learning_rate[, momentum, weight_decay, ...])<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jMh!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"jqhKh jJubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjNh jGubjO)<29><>}<7D>(hhh]<5D>h<)<29><>}<7D>(h<05>&Stochastic gradient descent optimizer.<2E>h]<5D>h0<68>&Stochastic gradient descent optimizer.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"<22>K/Users/awnihannun/repos/mlx/docs/src/python/optimizers.rst:47:<autosummary><3E>hKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjNh jGubeh}<7D>(
ubh0<68>+(learning_rate[, betas, eps, weight_decay])<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j
h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"j.hKh jubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjNh jubjO)<29><>}<7D>(hhh]<5D>h<)<29><>}<7D>(h<05>*Implementation of the AdamW optimizer [1].<2E>h]<5D>h0<68>*Implementation of the AdamW optimizer [1].<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jBh!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"<22>K/Users/awnihannun/repos/mlx/docs/src/python/optimizers.rst:47:<autosummary><3E>hKh j?ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjNh jubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjIh jFubjJ)<29><>}<7D>(hhh]<5D>(jO)<29><>}<7D>(hhh]<5D>h<)<29><>}<7D>(h<05>N:py:obj:`Adamax <mlx.optimizers.Adamax>`\ \(learning\_rate\[\, betas\, eps\]\)<29>h]<5D>(hF)<29><>}<7D>(h<05>(:py:obj:`Adamax <mlx.optimizers.Adamax>`<60>h]<5D>hL)<29><>}<7D>(hjih]<5D>h0<68>Adamax<61><78><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jkh!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-obj<62>eh]<5D>h]<5D>h]<5D>uhhKh jgubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>ju<00>reftype<70><65>obj<62><6A>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjjvhkNhl<68>mlx.optimizers.Adamax<61>uhhEh"<22>K/Users/awnihannun/repos/mlx/docs/src/python/optimizers.rst:47:<autosummary><3E>hKh jcubh0<68>(learning_rate[, betas, eps])<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jch!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"j<>hKh j`ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjNh j]ubjO)<29><>}<7D>(hhh]<5D>h<)<29><>}<7D>(h<05>'Implementation of the Adamax optimizer.<2E>h]<5D>h0<68>'Implementation of the Adamax optimizer.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"<22>K/Users/awnihannun/repos/mlx/docs/src/python/optimizers.rst:47:<autosummary><3E>hKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjNh j]ubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjIh jFubjJ)<29><>}<7D>(hhh]<5D>(jO)<29><>}<7D>(hhh]<5D>h<)<29><>}<7D>(h<05>T:py:obj:`Lion <mlx.optimizers.Lion>`\ \(learning\_rate\[\, betas\, weight\_decay\]\)<29>h]<5D>(hF)<29><>}<7D>(h<05>$:py:obj:`Lion <mlx.optimizers.Lion>`<60>h]<5D>hL)<29><>}<7D>(hj<>h]<5D>h0<68>Lion<6F><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-obj<62>eh]<5D>h]<5D>h]<5D>uhhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<EFBFBD><00>reftype<70><65>obj<62><6A>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjjvhkNhl<68>mlx.optimizers.Lion<6F>uhhEh"<22>K/Users/awnihannun/repos/mlx/docs/src/python/optimizers.rst:47:<autosummary><3E>hKh j<>ubh0<68>&(learning_rate[, betas, weight_decay])<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"j<>hKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjNh j<>ubjO)<29><>}<7D>(hhh]<5D>h<)<29><>}<7D>(h<05>)Implementation of the Lion optimizer [1].<2E>h]<5D>h0<68>)Implementation of the Lion optimizer [1].<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"jhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjNh j<>ubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjIh jFubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjDh j+ubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>cols<6C>Kuhj)h j&ubah}<7D>(h]<5D>h]<5D><>autosummary longtable<6C>ah]<5D>h]<5D>h]<5D>uhj$h j ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>jj	uhjh h&h!hh"jhNubj<00>autosummary_toc<6F><63><EFBFBD>)<29><>}<7D>(hhh]<5D>h<00>toctree<65><65><EFBFBD>)<29><>}<7D>(hhh]<5D>h}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>includefiles<65>]<5D>(<28>1python/_autosummary/mlx.optimizers.OptimizerState<74><65>,python/_autosummary/mlx.optimizers.Optimizer<65><72>&python/_autosummary/mlx.optimizers.SGD<47><44>*python/_autosummary/mlx.optimizers.RMSprop<6F><70>*python/_autosummary/mlx.optimizers.Adagrad<61><64>+python/_autosummary/mlx.optimizers.AdaDelta<74><61>'python/_autosummary/mlx.optimizers.Adam<61><6D>(python/_autosummary/mlx.optimizers.AdamW<6D><57>)python/_autosummary/mlx.optimizers.Adamax<61><78>'python/_autosummary/mlx.optimizers.Lion<6F>e<EFBFBD>entries<65>]<5D>(Nj:<00><>Nj;<00><>Nj<<00><>Nj=<00><>Nj><00><>Nj?<00><>Nj@<00><>NjA<00><>NjB<00><>NjC<00><>e<EFBFBD>maxdepth<74>J<EFBFBD><4A><EFBFBD><EFBFBD><EFBFBD>glob<6F>N<EFBFBD>caption<6F>N<EFBFBD>
rawentries<EFBFBD>]<5D>uhj-h j*ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>jj	uhj(h h&h!hh"jhNubeh}<7D>(h]<5D>(h<1D>id1<64>eh]<5D>h]<5D><>
optimizers<EFBFBD>ah]<5D><>
optimizers<EFBFBD>ah]<5D>uhh$h hh!hh"h#hK<04>
referenced<EFBFBD>K<01>expect_referenced_by_name<6D>}<7D>j`hs<>expect_referenced_by_id<69>}<7D>hhsubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>source<63>h#uhh<01>current_source<63>N<EFBFBD>current_line<6E>N<EFBFBD>settings<67><73>docutils.frontend<6E><64>Values<65><73><EFBFBD>)<29><>}<7D>(h)N<>	generator<6F>N<EFBFBD>	datestamp<6D>N<EFBFBD>source_link<6E>N<EFBFBD>
source_url<EFBFBD>N<EFBFBD>
toc_backlinks<6B>jN<00>footnote_backlinks<6B>K<01>
sectnum_xform<72>K<01>strip_comments<74>N<EFBFBD>strip_elements_with_classes<65>N<EFBFBD>
strip_classes<65>N<EFBFBD>report_level<65>K<02>
halt_level<EFBFBD>K<05>exit_status_level<65>K<05>debug<75>N<EFBFBD>warning_stream<61>N<EFBFBD>	traceback<63><6B><EFBFBD>input_encoding<6E><67>	utf-8-sig<69><67>input_encoding_error_handler<65><72>strict<63><74>output_encoding<6E><67>utf-8<><38>output_encoding_error_handler<65>j<EFBFBD><00>error_encoding<6E><67>utf-8<><38>error_encoding_error_handler<65><72>backslashreplace<63><65>
language_code<64><65>en<65><6E>record_dependencies<65>N<EFBFBD>config<69>N<EFBFBD>	id_prefix<69>h<06>auto_id_prefix<69><78>id<69><64>
dump_settings<67>N<EFBFBD>dump_internals<6C>N<EFBFBD>dump_transforms<6D>N<EFBFBD>dump_pseudo_xml<6D>N<EFBFBD>expose_internals<6C>N<EFBFBD>strict_visitor<6F>N<EFBFBD>_disable_config<69>N<EFBFBD>_source<63>h#<23>_destination<6F>N<EFBFBD>
_config_files<65>]<5D><>file_insertion_enabled<65><64><EFBFBD>raw_enabled<65>K<01>line_length_limit<69>M'<27>pep_references<65>N<EFBFBD>pep_base_url<72><6C>https://peps.python.org/<2F><>pep_file_url_template<74><65>pep-%04d<34><64>rfc_references<65>N<EFBFBD>rfc_base_url<72><6C>&https://datatracker.ietf.org/doc/html/<2F><>	tab_width<74>K<08>trim_footnote_reference_space<63><65><EFBFBD>syntax_highlight<68><74>long<6E><67>smart_quotes<65><73><EFBFBD>smartquotes_locales<65>]<5D><>character_level_inline_markup<75><70><EFBFBD>doctitle_xform<72><6D><EFBFBD>
docinfo_xform<72>K<01>sectsubtitle_xform<72><6D><EFBFBD>
image_loading<6E><67>link<6E><6B>embed_stylesheet<65><74><EFBFBD>cloak_email_addresses<65><73><EFBFBD>section_self_link<6E><6B><EFBFBD>env<6E>Nub<75>reporter<65>N<EFBFBD>indirect_targets<74>]<5D><>substitution_defs<66>}<7D><>substitution_names<65>}<7D><>refnames<65>}<7D><>refids<64>}<7D>h]<5D>has<61>nameids<64>}<7D>j`hs<>	nametypes<65>}<7D>j`<00>sh}<7D>(hh&j]h&u<>
footnote_refs<66>}<7D><>
citation_refs<66>}<7D><>
autofootnotes<65>]<5D><>autofootnote_refs<66>]<5D><>symbol_footnotes<65>]<5D><>symbol_footnote_refs<66>]<5D><>	footnotes<65>]<5D><>	citations<6E>]<5D><>autofootnote_start<72>K<01>symbol_footnote_start<72>K<00>
id_counter<EFBFBD><EFBFBD>collections<6E><73>Counter<65><72><EFBFBD>}<7D>j<EFBFBD>Ks<><73>R<EFBFBD><52>parse_messages<65>]<5D>h	<09>system_message<67><65><EFBFBD>)<29><>}<7D>(hhh]<5D>h<)<29><>}<7D>(h<05>-Duplicate implicit target name: "optimizers".<2E>h]<5D>h0<68>1Duplicate implicit target name: “optimizers”.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>j]a<>level<65>K<01>type<70><65>INFO<46><4F>source<63>h#<23>line<6E>Kuhj<>h h&h!hh"h#hKuba<62>transform_messages<65>]<5D>j<EFBFBD>)<29><>}<7D>(hhh]<5D>h<)<29><>}<7D>(hhh]<5D>h0<68>0Hyperlink target "optimizers" is not referenced.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>h jsbah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h j
ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>level<65>K<01>type<70>j<00>source<63>h#<23>line<6E>Kuhj<>uba<62>transformer<65>N<EFBFBD>include_log<6F>]<5D><>
decoration<EFBFBD>Nh!hub.