docs/build/doctrees/python/nn.doctree

<EFBFBD><05><>r<00>sphinx.addnodes<65><73>document<6E><74><EFBFBD>)<29><>}<7D>(<28>	rawsource<63><65><00><>children<65>]<5D>(<28>docutils.nodes<65><73>target<65><74><EFBFBD>)<29><>}<7D>(h<05>.. _nn:<3A>h]<5D><>
attributes<EFBFBD>}<7D>(<28>ids<64>]<5D><>classes<65>]<5D><>names<65>]<5D><>dupnames<65>]<5D><>backrefs<66>]<5D><>refid<69><64>nn<6E>u<EFBFBD>tagname<6D>h
<EFBFBD>line<6E>K<01>parent<6E>h<03>	_document<6E>h<03>source<63><65>2/Users/awnihannun/repos/mlx/docs/src/python/nn.rst<73>ubh	<09>section<6F><6E><EFBFBD>)<29><>}<7D>(hhh]<5D>(h	<09>title<6C><65><EFBFBD>)<29><>}<7D>(h<05>Neural Networks<6B>h]<5D>h	<09>Text<78><74><EFBFBD><EFBFBD>Neural Networks<6B><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h+h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh)h h&h!hh"h#hKubh	<09>	paragraph<70><68><EFBFBD>)<29><>}<7D>(hX?Writing arbitrarily complex neural networks in MLX can be done using only
:class:`mlx.core.array` and :meth:`mlx.core.value_and_grad`. However, this requires the
user to write again and again the same simple neural network operations as well
as handle all the parameter state and initialization manually and explicitly.<2E>h]<5D>(h0<68>JWriting arbitrarily complex neural networks in MLX can be done using only
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h=h!hh"NhNubh<00>pending_xref<65><66><EFBFBD>)<29><>}<7D>(h<05>:class:`mlx.core.array`<60>h]<5D>h	<09>literal<61><6C><EFBFBD>)<29><>}<7D>(hhIh]<5D>h0<68>mlx.core.array<61><79><EFBFBD><EFBFBD><EFBFBD>}<7D>(h hMh!hh"NhNubah}<7D>(h]<5D>h]<5D>(<28>xref<65><66>py<70><79>py-class<73>eh]<5D>h]<5D>h]<5D>uhhKh hGubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F><63>	python/nn<6E><6E>	refdomain<69>hX<68>reftype<70><65>class<73><73>refexplicit<69><74><EFBFBD>refwarn<72><6E><EFBFBD>	py:module<6C><65>mlx.nn<6E><6E>py:class<73>N<EFBFBD>	reftarget<65><74>mlx.core.array<61>uhhEh"h#hKh h=ubh0<68> and <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h=h!hh"NhNubhF)<29><>}<7D>(h<05>:meth:`mlx.core.value_and_grad`<60>h]<5D>hL)<29><>}<7D>(hhuh]<5D>h0<68>mlx.core.value_and_grad()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h hwh!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-meth<74>eh]<5D>h]<5D>h]<5D>uhhKh hsubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>h<EFBFBD><68>reftype<70><65>meth<74><68>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>mlx.core.value_and_grad<61>uhhEh"h#hKh h=ubh0<68><30>. However, this requires the
user to write again and again the same simple neural network operations as well
as handle all the parameter state and initialization manually and explicitly.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h=h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hKh h&h!hubh<)<29><>}<7D>(h<05><>The module :mod:`mlx.nn` solves this problem by providing an intuitive way of
composing neural network layers, initializing their parameters, freezing them
for finetuning and more.<2E>h]<5D>(h0<68>The module <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h<>h!hh"NhNubhF)<29><>}<7D>(h<05>
:mod:`mlx.nn`<60>h]<5D>hL)<29><>}<7D>(hh<>h]<5D>h0<68>mlx.nn<6E><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-mod<6F>eh]<5D>h]<5D>h]<5D>uhhKh h<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>h<EFBFBD><68>reftype<70><65>mod<6F><64>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>mlx.nn<6E>uhhEh"h#hK
h h<>ubh0<68><30> solves this problem by providing an intuitive way of
composing neural network layers, initializing their parameters, freezing them
for finetuning and more.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h<>h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hK
h h&h!hubh%)<29><>}<7D>(hhh]<5D>(h*)<29><>}<7D>(h<05> Quick Start with Neural Networks<6B>h]<5D>h0<68> Quick Start with Neural Networks<6B><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh)h h<>h!hh"h#hKubh	<09>
literal_block<63><6B><EFBFBD>)<29><>}<7D>(hXMimport mlx.core as mx
import mlx.nn as nn

class MLP(nn.Module):
    def __init__(self, in_dims: int, out_dims: int):
        super().__init__()

        self.layers = [
            nn.Linear(in_dims, 128),
            nn.Linear(128, 128),
            nn.Linear(128, out_dims),
        ]

    def __call__(self, x):
        for i, l in enumerate(self.layers):
            x = mx.maximum(x, 0) if i > 0 else x
            x = l(x)
        return x

# The model is created with all its parameters but nothing is initialized
# yet because MLX is lazily evaluated
mlp = MLP(2, 10)

# We can access its parameters by calling mlp.parameters()
params = mlp.parameters()
print(params["layers"][0]["weight"].shape)

# Printing a parameter will cause it to be evaluated and thus initialized
print(params["layers"][0])

# We can also force evaluate all parameters to initialize the model
mx.eval(mlp.parameters())

# A simple loss function.
# NOTE: It doesn't matter how it uses the mlp model. It currently captures
#       it from the local scope. It could be a positional argument or a
#       keyword argument.
def l2_loss(x, y):
    y_hat = mlp(x)
    return (y_hat - y).square().mean()

# Calling `nn.value_and_grad` instead of `mx.value_and_grad` returns the
# gradient with respect to `mlp.trainable_parameters()`
loss_and_grad = nn.value_and_grad(mlp, l2_loss)<29>h]<5D>h0XMimport mlx.core as mx
import mlx.nn as nn

class MLP(nn.Module):
    def __init__(self, in_dims: int, out_dims: int):
        super().__init__()

        self.layers = [
            nn.Linear(in_dims, 128),
            nn.Linear(128, 128),
            nn.Linear(128, out_dims),
        ]

    def __call__(self, x):
        for i, l in enumerate(self.layers):
            x = mx.maximum(x, 0) if i > 0 else x
            x = l(x)
        return x

# The model is created with all its parameters but nothing is initialized
# yet because MLX is lazily evaluated
mlp = MLP(2, 10)

# We can access its parameters by calling mlp.parameters()
params = mlp.parameters()
print(params["layers"][0]["weight"].shape)

# Printing a parameter will cause it to be evaluated and thus initialized
print(params["layers"][0])

# We can also force evaluate all parameters to initialize the model
mx.eval(mlp.parameters())

# A simple loss function.
# NOTE: It doesn't matter how it uses the mlp model. It currently captures
#       it from the local scope. It could be a positional argument or a
#       keyword argument.
def l2_loss(x, y):
    y_hat = mlp(x)
    return (y_hat - y).square().mean()

# Calling `nn.value_and_grad` instead of `mx.value_and_grad` returns the
# gradient with respect to `mlp.trainable_parameters()`
loss_and_grad = nn.value_and_grad(mlp, l2_loss)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>h h<>sbah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>	xml:space<63><65>preserve<76><65>force<63><65><EFBFBD>language<67><65>python<6F><6E>highlight_args<67>}<7D>uhh<>h"h#hKh h<>h!hubh)<29><>}<7D>(h<05>.. _module_class:<3A>h]<5D>h}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>h<1C>module-class<73>uhh
hKCh h<>h!hh"h#ubeh}<7D>(h]<5D><> quick-start-with-neural-networks<6B>ah]<5D>h]<5D><> quick start with neural networks<6B>ah]<5D>h]<5D>uhh$h h&h!hh"h#hKubh%)<29><>}<7D>(hhh]<5D>(h*)<29><>}<7D>(h<05>The Module Class<73>h]<5D>h0<68>The Module Class<73><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j
h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh)h j
h!hh"h#hKFubh<)<29><>}<7D>(hX-The workhorse of any neural network library is the :class:`Module` class. In
MLX the :class:`Module` class is a container of :class:`mlx.core.array` or
:class:`Module` instances. Its main function is to provide a way to
recursively **access** and **update** its parameters and those of its
submodules.<2E>h]<5D>(h0<68>3The workhorse of any neural network library is the <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubhF)<29><>}<7D>(h<05>:class:`Module`<60>h]<5D>hL)<29><>}<7D>(hj%h]<5D>h0<68>Module<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j'h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-class<73>eh]<5D>h]<5D>h]<5D>uhhKh j#ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j1<00>reftype<70><65>class<73><73>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>Module<6C>uhhEh"h#hKHh jubh0<68> class. In
MLX the <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubhF)<29><>}<7D>(h<05>:class:`Module`<60>h]<5D>hL)<29><>}<7D>(hjIh]<5D>h0<68>Module<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jKh!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-class<73>eh]<5D>h]<5D>h]<5D>uhhKh jGubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>jU<00>reftype<70><65>class<73><73>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>Module<6C>uhhEh"h#hKHh jubh0<68> class is a container of <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubhF)<29><>}<7D>(h<05>:class:`mlx.core.array`<60>h]<5D>hL)<29><>}<7D>(hjmh]<5D>h0<68>mlx.core.array<61><79><EFBFBD><EFBFBD><EFBFBD>}<7D>(h joh!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-class<73>eh]<5D>h]<5D>h]<5D>uhhKh jkubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>jy<00>reftype<70><65>class<73><73>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>mlx.core.array<61>uhhEh"h#hKHh jubh0<68> or
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubhF)<29><>}<7D>(h<05>:class:`Module`<60>h]<5D>hL)<29><>}<7D>(hj<>h]<5D>h0<68>Module<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-class<73>eh]<5D>h]<5D>h]<5D>uhhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<EFBFBD><00>reftype<70><65>class<73><73>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>Module<6C>uhhEh"h#hKHh jubh0<68>A instances. Its main function is to provide a way to
recursively <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubh	<09>strong<6E><67><EFBFBD>)<29><>}<7D>(h<05>
**access**<2A>h]<5D>h0<68>access<73><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhj<>h jubh0<68> and <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubj<62>)<29><>}<7D>(h<05>
**update**<2A>h]<5D>h0<68>update<74><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhj<>h jubh0<68>, its parameters and those of its
submodules.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hKHh j
h!hubh%)<29><>}<7D>(hhh]<5D>(h*)<29><>}<7D>(h<05>
Parameters<EFBFBD>h]<5D>h0<68>
Parameters<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh)h j<>h!hh"h#hKOubh<)<29><>}<7D>(h<05><>A parameter of a module is any public member of type :class:`mlx.core.array` (its
name should not start with ``_``). It can be arbitrarily nested in other
:class:`Module` instances or lists and dictionaries.<2E>h]<5D>(h0<68>5A parameter of a module is any public member of type <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhF)<29><>}<7D>(h<05>:class:`mlx.core.array`<60>h]<5D>hL)<29><>}<7D>(hj<>h]<5D>h0<68>mlx.core.array<61><79><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-class<73>eh]<5D>h]<5D>h]<5D>uhhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<00>reftype<70><65>class<73><73>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>mlx.core.array<61>uhhEh"h#hKQh j<>ubh0<68>! (its
name should not start with <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhL)<29><>}<7D>(h<05>``_``<60>h]<5D>h0<68>_<><5F><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhKh j<>ubh0<68>)). It can be arbitrarily nested in other
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhF)<29><>}<7D>(h<05>:class:`Module`<60>h]<5D>hL)<29><>}<7D>(hj0h]<5D>h0<68>Module<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j2h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-class<73>eh]<5D>h]<5D>h]<5D>uhhKh j.ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<<00>reftype<70><65>class<73><73>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>Module<6C>uhhEh"h#hKQh j<>ubh0<68>% instances or lists and dictionaries.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hKQh j<>h!hubh<)<29><>}<7D>(h<05>|:meth:`Module.parameters` can be used to extract a nested dictionary with all
the parameters of a module and its submodules.<2E>h]<5D>(hF)<29><>}<7D>(h<05>:meth:`Module.parameters`<60>h]<5D>hL)<29><>}<7D>(hj^h]<5D>h0<68>Module.parameters()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j`h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-meth<74>eh]<5D>h]<5D>h]<5D>uhhKh j\ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>jj<00>reftype<70><65>meth<74><68>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>Module.parameters<72>uhhEh"h#hKUh jXubh0<68>c can be used to extract a nested dictionary with all
the parameters of a module and its submodules.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jXh!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hKUh j<>h!hubh<)<29><>}<7D>(h<05><>A :class:`Module` can also keep track of "frozen" parameters. See the
:meth:`Module.freeze` method for more details. :meth:`mlx.nn.value_and_grad`
the gradients returned will be with respect to these trainable parameters.<2E>h]<5D>(h0<68>A <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhF)<29><>}<7D>(h<05>:class:`Module`<60>h]<5D>hL)<29><>}<7D>(hj<>h]<5D>h0<68>Module<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-class<73>eh]<5D>h]<5D>h]<5D>uhhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<EFBFBD><00>reftype<70><65>class<73><73>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>Module<6C>uhhEh"h#hKXh j<>ubh0<68>9 can also keep track of “frozen” parameters. See the
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhF)<29><>}<7D>(h<05>:meth:`Module.freeze`<60>h]<5D>hL)<29><>}<7D>(hj<>h]<5D>h0<68>Module.freeze()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-meth<74>eh]<5D>h]<5D>h]<5D>uhhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<EFBFBD><00>reftype<70><65>meth<74><68>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>
Module.freeze<7A>uhhEh"h#hKXh j<>ubh0<68> method for more details. <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhF)<29><>}<7D>(h<05>:meth:`mlx.nn.value_and_grad`<60>h]<5D>hL)<29><>}<7D>(hj<>h]<5D>h0<68>mlx.nn.value_and_grad()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-meth<74>eh]<5D>h]<5D>h]<5D>uhhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<EFBFBD><00>reftype<70><65>meth<74><68>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>mlx.nn.value_and_grad<61>uhhEh"h#hKXh j<>ubh0<68>K
the gradients returned will be with respect to these trainable parameters.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hKXh j<>h!hubeh}<7D>(h]<5D><>
parameters<EFBFBD>ah]<5D>h]<5D><>
parameters<EFBFBD>ah]<5D>h]<5D>uhh$h j
h!hh"h#hKOubh%)<29><>}<7D>(hhh]<5D>(h*)<29><>}<7D>(h<05>Updating the Parameters<72>h]<5D>h0<68>Updating the Parameters<72><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh)h jh!hh"h#hK^ubh<)<29><>}<7D>(h<05><>MLX modules allow accessing and updating individual parameters. However, most
times we need to update large subsets of a module's parameters. This action is
performed by :meth:`Module.update`.<2E>h]<5D>(h0<68><30>MLX modules allow accessing and updating individual parameters. However, most
times we need to update large subsets of a module’s parameters. This action is
performed by <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubhF)<29><>}<7D>(h<05>:meth:`Module.update`<60>h]<5D>hL)<29><>}<7D>(hj#h]<5D>h0<68>Module.update()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j%h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-meth<74>eh]<5D>h]<5D>h]<5D>uhhKh j!ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j/<00>reftype<70><65>meth<74><68>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>
Module.update<74>uhhEh"h#hK`h jubh0<68>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hK`h jh!hubeh}<7D>(h]<5D><>updating-the-parameters<72>ah]<5D>h]<5D><>updating the parameters<72>ah]<5D>h]<5D>uhh$h j
h!hh"h#hK^ubh%)<29><>}<7D>(hhh]<5D>(h*)<29><>}<7D>(h<05>Inspecting Modules<65>h]<5D>h0<68>Inspecting Modules<65><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jVh!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh)h jSh!hh"h#hKfubh<)<29><>}<7D>(h<05><>The simplest way to see the model architecture is to print it. Following along with
the above example, you can print the ``MLP`` with:<3A>h]<5D>(h0<68>yThe simplest way to see the model architecture is to print it. Following along with
the above example, you can print the <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jdh!hh"NhNubhL)<29><>}<7D>(h<05>``MLP``<60>h]<5D>h0<68>MLP<4C><50><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jlh!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhKh jdubh0<68> with:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jdh!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hKhh jSh!hubh<62>)<29><>}<7D>(h<05>
print(mlp)<29>h]<5D>h0<68>
print(mlp)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>h j<>sbah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>h<EFBFBD>h<EFBFBD>h<EFBFBD><68>h<EFBFBD><68>python<6F>h<EFBFBD>}<7D>uhh<>h"h#hKkh jSh!hubh<)<29><>}<7D>(h<05>This will display:<3A>h]<5D>h0<68>This will display:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hKoh jSh!hubh<62>)<29><>}<7D>(h<05><>MLP(
  (layers.0): Linear(input_dims=2, output_dims=128, bias=True)
  (layers.1): Linear(input_dims=128, output_dims=128, bias=True)
  (layers.2): Linear(input_dims=128, output_dims=10, bias=True)
)<29>h]<5D>h0<68><30>MLP(
  (layers.0): Linear(input_dims=2, output_dims=128, bias=True)
  (layers.1): Linear(input_dims=128, output_dims=128, bias=True)
  (layers.2): Linear(input_dims=128, output_dims=10, bias=True)
)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>h j<>sbah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>h<EFBFBD>h<EFBFBD>h<EFBFBD><68>h<EFBFBD><68>shell<6C>h<EFBFBD>}<7D>uhh<>h"h#hKqh jSh!hubh<)<29><>}<7D>(h<05><>To get more detailed information on the arrays in a :class:`Module` you can use
:func:`mlx.utils.tree_map` on the parameters. For example, to see the shapes of
all the parameters in a :class:`Module` do:<3A>h]<5D>(h0<68>4To get more detailed information on the arrays in a <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhF)<29><>}<7D>(h<05>:class:`Module`<60>h]<5D>hL)<29><>}<7D>(hj<>h]<5D>h0<68>Module<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-class<73>eh]<5D>h]<5D>h]<5D>uhhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<EFBFBD><00>reftype<70><65>class<73><73>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>Module<6C>uhhEh"h#hKyh j<>ubh0<68>
 you can use
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhF)<29><>}<7D>(h<05>:func:`mlx.utils.tree_map`<60>h]<5D>hL)<29><>}<7D>(hj<>h]<5D>h0<68>mlx.utils.tree_map()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-func<6E>eh]<5D>h]<5D>h]<5D>uhhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<EFBFBD><00>reftype<70><65>func<6E><63>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>mlx.utils.tree_map<61>uhhEh"h#hKyh j<>ubh0<68>N on the parameters. For example, to see the shapes of
all the parameters in a <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhF)<29><>}<7D>(h<05>:class:`Module`<60>h]<5D>hL)<29><>}<7D>(hjh]<5D>h0<68>Module<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-class<73>eh]<5D>h]<5D>h]<5D>uhhKh jubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<00>reftype<70><65>class<73><73>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>Module<6C>uhhEh"h#hKyh j<>ubh0<68> do:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hKyh jSh!hubh<62>)<29><>}<7D>(h<05>Ufrom mlx.utils import tree_map
shapes = tree_map(lambda p: p.shape, mlp.parameters())<29>h]<5D>h0<68>Ufrom mlx.utils import tree_map
shapes = tree_map(lambda p: p.shape, mlp.parameters())<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>h j,sbah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>h<EFBFBD>h<EFBFBD>h<EFBFBD><68>h<EFBFBD><68>python<6F>h<EFBFBD>}<7D>uhh<>h"h#hK}h jSh!hubh<)<29><>}<7D>(h<05>UAs another example, you can count the number of parameters in a :class:`Module`
with:<3A>h]<5D>(h0<68>@As another example, you can count the number of parameters in a <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<h!hh"NhNubhF)<29><>}<7D>(h<05>:class:`Module`<60>h]<5D>hL)<29><>}<7D>(hjFh]<5D>h0<68>Module<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jHh!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-class<73>eh]<5D>h]<5D>h]<5D>uhhKh jDubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>jR<00>reftype<70><65>class<73><73>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>Module<6C>uhhEh"h#hK<>h j<ubh0<68>
with:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hK<>h jSh!hubh<62>)<29><>}<7D>(h<05>ffrom mlx.utils import tree_flatten
num_params = sum(v.size for _, v in tree_flatten(mlp.parameters()))<29>h]<5D>h0<68>ffrom mlx.utils import tree_flatten
num_params = sum(v.size for _, v in tree_flatten(mlp.parameters()))<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>h jnsbah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>h<EFBFBD>h<EFBFBD>h<EFBFBD><68>h<EFBFBD><68>python<6F>h<EFBFBD>}<7D>uhh<>h"h#hK<>h jSh!hubeh}<7D>(h]<5D><>inspecting-modules<65>ah]<5D>h]<5D><>inspecting modules<65>ah]<5D>h]<5D>uhh$h j
h!hh"h#hKfubeh}<7D>(h]<5D>(<28>the-module-class<73>jeh]<5D>h]<5D>(<28>the module class<73><73>module_class<73>eh]<5D>h]<5D>uhh$h h&h!hh"h#hKF<4B>expect_referenced_by_name<6D>}<7D>j<EFBFBD>h<>s<EFBFBD>expect_referenced_by_id<69>}<7D>jh<>subh%)<29><>}<7D>(hhh]<5D>(h*)<29><>}<7D>(h<05>Value and Grad<61>h]<5D>h0<68>Value and Grad<61><64><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh)h j<>h!hh"h#hK<>ubh<)<29><>}<7D>(hX1Using a :class:`Module` does not preclude using MLX's high order function
transformations (:meth:`mlx.core.value_and_grad`, :meth:`mlx.core.grad`, etc.). However,
these function transformations assume pure functions, namely the parameters
should be passed as an argument to the function being transformed.<2E>h]<5D>(h0<68>Using a <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhF)<29><>}<7D>(h<05>:class:`Module`<60>h]<5D>hL)<29><>}<7D>(hj<>h]<5D>h0<68>Module<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-class<73>eh]<5D>h]<5D>h]<5D>uhhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<EFBFBD><00>reftype<70><65>class<73><73>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>Module<6C>uhhEh"h#hK<>h j<>ubh0<68>F does not preclude using MLX’s high order function
transformations (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhF)<29><>}<7D>(h<05>:meth:`mlx.core.value_and_grad`<60>h]<5D>hL)<29><>}<7D>(hj<>h]<5D>h0<68>mlx.core.value_and_grad()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-meth<74>eh]<5D>h]<5D>h]<5D>uhhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<EFBFBD><00>reftype<70><65>meth<74><68>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>mlx.core.value_and_grad<61>uhhEh"h#hK<>h j<>ubh0<68>, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhF)<29><>}<7D>(h<05>:meth:`mlx.core.grad`<60>h]<5D>hL)<29><>}<7D>(hj<>h]<5D>h0<68>mlx.core.grad()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-meth<74>eh]<5D>h]<5D>h]<5D>uhhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<00>reftype<70><65>meth<74><68>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>
mlx.core.grad<61>uhhEh"h#hK<>h j<>ubh0<68><30>, etc.). However,
these function transformations assume pure functions, namely the parameters
should be passed as an argument to the function being transformed.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hK<>h j<>h!hubh<)<29><>}<7D>(h<05>9There is an easy pattern to achieve that with MLX modules<65>h]<5D>h0<68>9There is an easy pattern to achieve that with MLX modules<65><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hK<>h j<>h!hubh<62>)<29><>}<7D>(h<05><>model = ...

def f(params, other_inputs):
    model.update(params)  # <---- Necessary to make the model use the passed parameters
    return model(other_inputs)

f(model.trainable_parameters(), mx.zeros((10,)))<29>h]<5D>h0<68><30>model = ...

def f(params, other_inputs):
    model.update(params)  # <---- Necessary to make the model use the passed parameters
    return model(other_inputs)

f(model.trainable_parameters(), mx.zeros((10,)))<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>h j,sbah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>h<EFBFBD>h<EFBFBD>h<EFBFBD><68>h<EFBFBD><68>python<6F>h<EFBFBD>}<7D>uhh<>h"h#hK<>h j<>h!hubh<)<29><>}<7D>(h<05><>However, :meth:`mlx.nn.value_and_grad` provides precisely this pattern and only
computes the gradients with respect to the trainable parameters of the model.<2E>h]<5D>(h0<68>	However, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<h!hh"NhNubhF)<29><>}<7D>(h<05>:meth:`mlx.nn.value_and_grad`<60>h]<5D>hL)<29><>}<7D>(hjFh]<5D>h0<68>mlx.nn.value_and_grad()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jHh!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-meth<74>eh]<5D>h]<5D>h]<5D>uhhKh jDubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>jR<00>reftype<70><65>meth<74><68>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>mlx.nn.value_and_grad<61>uhhEh"h#hK<>h j<ubh0<68>w provides precisely this pattern and only
computes the gradients with respect to the trainable parameters of the model.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hK<>h j<>h!hubh<)<29><>}<7D>(h<05>
In detail:<3A>h]<5D>h0<68>
In detail:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jnh!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hK<>h j<>h!hubh	<09>bullet_list<73><74><EFBFBD>)<29><>}<7D>(hhh]<5D>(h	<09>	list_item<65><6D><EFBFBD>)<29><>}<7D>(h<05><>it wraps the passed function with a function that calls :meth:`Module.update`
to make sure the model is using the provided parameters.<2E>h]<5D>h<)<29><>}<7D>(h<05><>it wraps the passed function with a function that calls :meth:`Module.update`
to make sure the model is using the provided parameters.<2E>h]<5D>(h0<68>8it wraps the passed function with a function that calls <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhF)<29><>}<7D>(h<05>:meth:`Module.update`<60>h]<5D>hL)<29><>}<7D>(hj<>h]<5D>h0<68>Module.update()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-meth<74>eh]<5D>h]<5D>h]<5D>uhhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<EFBFBD><00>reftype<70><65>meth<74><68>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>
Module.update<74>uhhEh"h#hK<>h j<>ubh0<68>9
to make sure the model is using the provided parameters.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hK<>h j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhj<>h j~h!hh"h#hNubj<62>)<29><>}<7D>(h<05><>it calls :meth:`mlx.core.value_and_grad` to transform the function into a function
that also computes the gradients with respect to the passed parameters.<2E>h]<5D>h<)<29><>}<7D>(h<05><>it calls :meth:`mlx.core.value_and_grad` to transform the function into a function
that also computes the gradients with respect to the passed parameters.<2E>h]<5D>(h0<68>	it calls <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhF)<29><>}<7D>(h<05>:meth:`mlx.core.value_and_grad`<60>h]<5D>hL)<29><>}<7D>(hj<>h]<5D>h0<68>mlx.core.value_and_grad()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-meth<74>eh]<5D>h]<5D>h]<5D>uhhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<EFBFBD><00>reftype<70><65>meth<74><68>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>mlx.core.value_and_grad<61>uhhEh"h#hK<>h j<>ubh0<68>r to transform the function into a function
that also computes the gradients with respect to the passed parameters.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hK<>h j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhj<>h j~h!hh"h#hNubj<62>)<29><>}<7D>(h<05><>it wraps the returned function with a function that passes the trainable
parameters as the first argument to the function returned by
:meth:`mlx.core.value_and_grad`
<EFBFBD>h]<5D>h<)<29><>}<7D>(h<05><>it wraps the returned function with a function that passes the trainable
parameters as the first argument to the function returned by
:meth:`mlx.core.value_and_grad`<60>h]<5D>(h0<68><30>it wraps the returned function with a function that passes the trainable
parameters as the first argument to the function returned by
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhF)<29><>}<7D>(h<05>:meth:`mlx.core.value_and_grad`<60>h]<5D>hL)<29><>}<7D>(hj	h]<5D>h0<68>mlx.core.value_and_grad()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-meth<74>eh]<5D>h]<5D>h]<5D>uhhKh jubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<00>reftype<70><65>meth<74><68>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>mlx.core.value_and_grad<61>uhhEh"h#hK<>h j<>ubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"h#hK<>h j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhj<>h j~h!hh"h#hNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>bullet<65><74>-<2D>uhj|h"h#hK<>h j<>h!hubh<00>tabular_col_spec<65><63><EFBFBD>)<29><>}<7D>(hhh]<5D>h}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>spec<65><63>\X{1}{2}\X{1}{2}<7D>uhj;h j<>h!hh"<22>D/Users/awnihannun/repos/mlx/docs/src/python/nn.rst:176:<autosummary><3E>hNub<75>sphinx.ext.autosummary<72><79>autosummary_table<6C><65><EFBFBD>)<29><>}<7D>(h<05><>


value_and_grad(model, fn)

Transform the passed function fn to a function that computes the gradients of fn wrt the model's trainable parameters and also its value.<2E>h]<5D>h	<09>table<6C><65><EFBFBD>)<29><>}<7D>(hhh]<5D>h	<09>tgroup<75><70><EFBFBD>)<29><>}<7D>(hhh]<5D>(h	<09>colspec<65><63><EFBFBD>)<29><>}<7D>(hhh]<5D>h}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>colwidth<74>K
uhjZh jWubj[)<29><>}<7D>(hhh]<5D>h}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>colwidth<74>KZuhjZh jWubh	<09>tbody<64><79><EFBFBD>)<29><>}<7D>(hhh]<5D>h	<09>row<6F><77><EFBFBD>)<29><>}<7D>(hhh]<5D>(h	<09>entry<72><79><EFBFBD>)<29><>}<7D>(hhh]<5D>h<)<29><>}<7D>(h<05>@:py:obj:`value_and_grad <mlx.nn.value_and_grad>`\ \(model\, fn\)<29>h]<5D>(hF)<29><>}<7D>(h<05>0:py:obj:`value_and_grad <mlx.nn.value_and_grad>`<60>h]<5D>hL)<29><>}<7D>(hj<>h]<5D>h0<68>value_and_grad<61><64><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hW<68>py<70><79>py-obj<62>eh]<5D>h]<5D>h]<5D>uhhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hd<68>	refdomain<69>j<EFBFBD><00>reftype<70><65>obj<62><6A>refexplicit<69><74><EFBFBD>refwarn<72><6E>hjhkhlNhm<68>mlx.nn.value_and_grad<61>uhhEh"<22>D/Users/awnihannun/repos/mlx/docs/src/python/nn.rst:176:<autosummary><3E>hKh jubh0<68>(model, fn)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"j<>hKh j|ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjzh jwubj{)<29><>}<7D>(hhh]<5D>h<)<29><>}<7D>(h<05><>Transform the passed function ``fn`` to a function that computes the gradients of ``fn`` wrt the model's trainable parameters and also its value.<2E>h]<5D>(h0<68>Transform the passed function <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhL)<29><>}<7D>(h<05>``fn``<60>h]<5D>h0<68>fn<66><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhKh j<>ubh0<68>. to a function that computes the gradients of <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhL)<29><>}<7D>(h<05>``fn``<60>h]<5D>h0<68>fn<66><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhKh j<>ubh0<68>9 wrt the model's trainable parameters and also its value.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h"jHhKh j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjzh jwubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjuh jrubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhjph jWubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>cols<6C>KuhjUh jRubah}<7D>(h]<5D>h]<5D><>autosummary longtable<6C>ah]<5D>h]<5D>h]<5D>uhjPh jLubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>h<EFBFBD>h<EFBFBD>uhjJh j<>h!hh"jHhNubjI<00>autosummary_toc<6F><63><EFBFBD>)<29><>}<7D>(hhh]<5D>h<00>toctree<65><65><EFBFBD>)<29><>}<7D>(hhh]<5D>h}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>includefiles<65>]<5D><>)python/_autosummary/mlx.nn.value_and_grad<61>a<EFBFBD>entries<65>]<5D>Nj!<00><>a<EFBFBD>maxdepth<74>J<EFBFBD><4A><EFBFBD><EFBFBD><EFBFBD>glob<6F>N<EFBFBD>caption<6F>N<EFBFBD>
rawentries<EFBFBD>]<5D>uhjh jubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>h<EFBFBD>h<EFBFBD>uhjh j<>h!hh"jHhNubh	<09>compound<6E><64><EFBFBD>)<29><>}<7D>(hhh]<5D>j)<29><>}<7D>(hhh]<5D>h}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>h hdj"]<5D>(N<>python/nn/module<6C><65><EFBFBD>N<EFBFBD>python/nn/layers<72><73><EFBFBD>N<EFBFBD>python/nn/functions<6E><73><EFBFBD>N<EFBFBD>python/nn/losses<65><73><EFBFBD>ej]<5D>(j?jAjCjEej%J<><4A><EFBFBD><EFBFBD>j'Nj&<00><>hidden<65><6E><EFBFBD>
includehidden<65><6E><EFBFBD>numbered<65>K<00>
titlesonly<EFBFBD><EFBFBD>j(]<5D>uhjh"h#hK<>h j2ubah}<7D>(h]<5D>h]<5D><>toctree-wrapper<65>ah]<5D>h]<5D>h]<5D>uhj0h j<>h!hh"h#hK<>ubeh}<7D>(h]<5D><>value-and-grad<61>ah]<5D>h]<5D><>value and grad<61>ah]<5D>h]<5D>uhh$h h&h!hh"h#hK<>ubeh}<7D>(h]<5D>(<28>neural-networks<6B>heh]<5D>h]<5D>(<28>neural networks<6B><73>nn<6E>eh]<5D>h]<5D>uhh$h hh!hh"h#hKj<>}<7D>jbhsj<73>}<7D>hhsubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>source<63>h#uhh<01>current_source<63>N<EFBFBD>current_line<6E>N<EFBFBD>settings<67><73>docutils.frontend<6E><64>Values<65><73><EFBFBD>)<29><>}<7D>(h)N<>	generator<6F>N<EFBFBD>	datestamp<6D>N<EFBFBD>source_link<6E>N<EFBFBD>
source_url<EFBFBD>N<EFBFBD>
toc_backlinks<6B>jz<00>footnote_backlinks<6B>K<01>
sectnum_xform<72>K<01>strip_comments<74>N<EFBFBD>strip_elements_with_classes<65>N<EFBFBD>
strip_classes<65>N<EFBFBD>report_level<65>K<02>
halt_level<EFBFBD>K<05>exit_status_level<65>K<05>debug<75>N<EFBFBD>warning_stream<61>N<EFBFBD>	traceback<63><6B><EFBFBD>input_encoding<6E><67>	utf-8-sig<69><67>input_encoding_error_handler<65><72>strict<63><74>output_encoding<6E><67>utf-8<><38>output_encoding_error_handler<65>j<EFBFBD><00>error_encoding<6E><67>utf-8<><38>error_encoding_error_handler<65><72>backslashreplace<63><65>
language_code<64><65>en<65><6E>record_dependencies<65>N<EFBFBD>config<69>N<EFBFBD>	id_prefix<69>h<06>auto_id_prefix<69><78>id<69><64>
dump_settings<67>N<EFBFBD>dump_internals<6C>N<EFBFBD>dump_transforms<6D>N<EFBFBD>dump_pseudo_xml<6D>N<EFBFBD>expose_internals<6C>N<EFBFBD>strict_visitor<6F>N<EFBFBD>_disable_config<69>N<EFBFBD>_source<63>h#<23>_destination<6F>N<EFBFBD>
_config_files<65>]<5D><>file_insertion_enabled<65><64><EFBFBD>raw_enabled<65>K<01>line_length_limit<69>M'<27>pep_references<65>N<EFBFBD>pep_base_url<72><6C>https://peps.python.org/<2F><>pep_file_url_template<74><65>pep-%04d<34><64>rfc_references<65>N<EFBFBD>rfc_base_url<72><6C>&https://datatracker.ietf.org/doc/html/<2F><>	tab_width<74>K<08>trim_footnote_reference_space<63><65><EFBFBD>syntax_highlight<68><74>long<6E><67>smart_quotes<65><73><EFBFBD>smartquotes_locales<65>]<5D><>character_level_inline_markup<75><70><EFBFBD>doctitle_xform<72><6D><EFBFBD>
docinfo_xform<72>K<01>sectsubtitle_xform<72><6D><EFBFBD>
image_loading<6E><67>link<6E><6B>embed_stylesheet<65><74><EFBFBD>cloak_email_addresses<65><73><EFBFBD>section_self_link<6E><6B><EFBFBD>env<6E>Nub<75>reporter<65>N<EFBFBD>indirect_targets<74>]<5D><>substitution_defs<66>}<7D><>substitution_names<65>}<7D><>refnames<65>}<7D><>refids<64>}<7D>(h]<5D>haj]<5D>h<EFBFBD>au<61>nameids<64>}<7D>(jbhjaj^jjj<>jj<>j<>jjjPjMj<>j<>jYjVu<>	nametypes<65>}<7D>(jb<00>ja<00>j<00>j<EFBFBD><00>j<EFBFBD><00>j<00>jP<00>j<EFBFBD><00>jY<00>uh}<7D>(hh&j^h&jh<>jj
j<>j
jj<>jMjj<>jSjVj<>u<>
footnote_refs<66>}<7D><>
citation_refs<66>}<7D><>
autofootnotes<65>]<5D><>autofootnote_refs<66>]<5D><>symbol_footnotes<65>]<5D><>symbol_footnote_refs<66>]<5D><>	footnotes<65>]<5D><>	citations<6E>]<5D><>autofootnote_start<72>K<01>symbol_footnote_start<72>K<00>
id_counter<EFBFBD><EFBFBD>collections<6E><73>Counter<65><72><EFBFBD>}<7D><><EFBFBD>R<EFBFBD><52>parse_messages<65>]<5D><>transform_messages<65>]<5D>(h	<09>system_message<67><65><EFBFBD>)<29><>}<7D>(hhh]<5D>h<)<29><>}<7D>(hhh]<5D>h0<68>(Hyperlink target "nn" is not referenced.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>h j<>sbah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>level<65>K<01>type<70><65>INFO<46><4F>source<63>h#<23>line<6E>Kuhj<>ubj<62>)<29><>}<7D>(hhh]<5D>h<)<29><>}<7D>(hhh]<5D>h0<68>2Hyperlink target "module-class" is not referenced.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>h jsbah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh;h jubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>level<65>K<01>type<70>j<00>source<63>h#<23>line<6E>KCuhj<>ube<62>transformer<65>N<EFBFBD>include_log<6F>]<5D><>
decoration<EFBFBD>Nh!hub.