Files
mlx/docs/build/doctrees/usage/lazy_evaluation.doctree

135 lines
20 KiB
Plaintext
Raw Normal View History

2024-01-17 17:15:29 -08:00
<EFBFBD><05>4P<00>sphinx.addnodes<65><73>document<6E><74><EFBFBD>)<29><>}<7D>(<28> rawsource<63><65><00><>children<65>]<5D>(<28>docutils.nodes<65><73>target<65><74><EFBFBD>)<29><>}<7D>(h<05>.. _lazy eval:<3A>h]<5D><>
attributes<EFBFBD>}<7D>(<28>ids<64>]<5D><>classes<65>]<5D><>names<65>]<5D><>dupnames<65>]<5D><>backrefs<66>]<5D><>refid<69><64> lazy-eval<61>u<EFBFBD>tagname<6D>h
<EFBFBD>line<6E>K<01>parent<6E>h<03> _document<6E>h<03>source<63><65>>/Users/awnihannun/repos/mlx/docs/src/usage/lazy_evaluation.rst<73>ubh <09>section<6F><6E><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09>title<6C><65><EFBFBD>)<29><>}<7D>(h<05>Lazy Evaluation<6F>h]<5D>h <09>Text<78><74><EFBFBD><EFBFBD>Lazy Evaluation<6F><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h+h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh)h h&h!hh"h#hKubh%)<29><>}<7D>(hhh]<5D>(h*)<29><>}<7D>(h<05>Why Lazy Evaluation<6F>h]<5D>h0<68>Why Lazy Evaluation<6F><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh)h h;h!hh"h#hK ubh <09> paragraph<70><68><EFBFBD>)<29><>}<7D>(h<05><>When you perform operations in MLX, no computation actually happens. Instead a
compute graph is recorded. The actual computation only happens if an
:func:`eval` is performed.<2E>h]<5D>(h0<68><30>When you perform operations in MLX, no computation actually happens. Instead a
compute graph is recorded. The actual computation only happens if an
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h hNh!hh"NhNubh<00> pending_xref<65><66><EFBFBD>)<29><>}<7D>(h<05> :func:`eval`<60>h]<5D>h <09>literal<61><6C><EFBFBD>)<29><>}<7D>(hhZh]<5D>h0<68>eval()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h^h!hh"NhNubah}<7D>(h]<5D>h]<5D>(<28>xref<65><66>py<70><79>py-func<6E>eh]<5D>h]<5D>h]<5D>uhh\h hXubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F><63>usage/lazy_evaluation<6F><6E> refdomain<69>hi<68>reftype<70><65>func<6E><63> refexplicit<69><74><EFBFBD>refwarn<72><6E><EFBFBD> py:module<6C><65>mlx.core<72><65>py:class<73>N<EFBFBD> reftarget<65><74>eval<61>uhhVh"h#hK h hNubh0<68> is performed.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h hNh!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hK h h;h!hubhM)<29><>}<7D>(h<05>\MLX uses lazy evaluation because it has some nice features, some of which we
describe below.<2E>h]<5D>h0<68>\MLX uses lazy evaluation because it has some nice features, some of which we
describe below.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hKh h;h!hubh%)<29><>}<7D>(hhh]<5D>(h*)<29><>}<7D>(h<05>Transforming Compute Graphs<68>h]<5D>h0<68>Transforming Compute Graphs<68><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh)h h<>h!hh"h#hKubhM)<29><>}<7D>(h<05><>Lazy evaluation let's us record a compute graph without actually doing any
computations. This is useful for function transformations like :func:`grad` and
:func:`vmap` and graph optimizations like :func:`simplify`.<2E>h]<5D>(h0<68><30>Lazy evaluation lets us record a compute graph without actually doing any
computations. This is useful for function transformations like <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h<>h!hh"NhNubhW)<29><>}<7D>(h<05> :func:`grad`<60>h]<5D>h])<29><>}<7D>(hh<>h]<5D>h0<68>grad()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hh<68>py<70><79>py-func<6E>eh]<5D>h]<5D>h]<5D>uhh\h h<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hu<68> refdomain<69>h<EFBFBD><68>reftype<70><65>func<6E><63> refexplicit<69><74><EFBFBD>refwarn<72><6E>h{h|h}Nh~<7E>grad<61>uhhVh"h#hKh h<>ubh0<68> and
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h<>h!hh"NhNubhW)<29><>}<7D>(h<05> :func:`vmap`<60>h]<5D>h])<29><>}<7D>(hh<>h]<5D>h0<68>vmap()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hh<68>py<70><79>py-func<6E>eh]<5D>h]<5D>h]<5D>uhh\h h<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hu<68> refdomain<69>h<EFBFBD><68>reftype<70><65>func<6E><63> refexplicit<69><74><EFBFBD>refwarn<72><6E>h{h|h}Nh~<7E>vmap<61>uhhVh"h#hKh h<>ubh0<68> and graph optimizations like <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h<>h!hh"NhNubhW)<29><>}<7D>(h<05>:func:`simplify`<60>h]<5D>h])<29><>}<7D>(hh<>h]<5D>h0<68>
simplify()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hh<68>py<70><79>py-func<6E>eh]<5D>h]<5D>h]<5D>uhh\h h<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hu<68> refdomain<69>j<00>reftype<70><65>func<6E><63> refexplicit<69><74><EFBFBD>refwarn<72><6E>h{h|h}Nh~<7E>simplify<66>uhhVh"h#hKh h<>ubh0<68>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h h<>h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hKh h<>h!hubhM)<29><>}<7D>(h<05><>Currently, MLX does not compile and rerun compute graphs. They are all
generated dynamically. However, lazy evaluation makes it much easier to
integrate compilation for future performance enhancements.<2E>h]<5D>h0<68><30>Currently, MLX does not compile and rerun compute graphs. They are all
generated dynamically. However, lazy evaluation makes it much easier to
integrate compilation for future performance enhancements.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j#h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hKh h<>h!hubeh}<7D>(h]<5D><>transforming-compute-graphs<68>ah]<5D>h]<5D><>transforming compute graphs<68>ah]<5D>h]<5D>uhh$h h;h!hh"h#hKubh%)<29><>}<7D>(hhh]<5D>(h*)<29><>}<7D>(h<05>Only Compute What You Use<73>h]<5D>h0<68>Only Compute What You Use<73><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh)h j9h!hh"h#hKubhM)<29><>}<7D>(h<05>aIn MLX you do not need to worry as much about computing outputs that are never
used. For example:<3A>h]<5D>h0<68>aIn MLX you do not need to worry as much about computing outputs that are never
used. For example:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jJh!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hK h j9h!hubh <09> literal_block<63><6B><EFBFBD>)<29><>}<7D>(h<05>Sdef fun(x):
a = fun1(x)
b = expensive_fun(a)
return a, b
y, _ = fun(x)<29>h]<5D>h0<68>Sdef fun(x):
a = fun1(x)
b = expensive_fun(a)
return a, b
y, _ = fun(x)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>h jZsbah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><> xml:space<63><65>preserve<76><65>force<63><65><EFBFBD>language<67><65>python<6F><6E>highlight_args<67>}<7D>uhjXh"h#hK#h j9h!hubhM)<29><>}<7D>(h<05><>Here, we never actually compute the output of ``expensive_fun``. Use this
pattern with care though, as the graph of ``expensive_fun`` is still built, and
that has some cost associated to it.<2E>h]<5D>(h0<68>.Here, we never actually compute the output of <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h joh!hh"NhNubh])<29><>}<7D>(h<05>``expensive_fun``<60>h]<5D>h0<68> expensive_fun<75><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jwh!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh\h joubh0<68>5. Use this
pattern with care though, as the graph of <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h joh!hh"NhNubh])<29><>}<7D>(h<05>``expensive_fun``<60>h]<5D>h0<68> expensive_fun<75><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh\h joubh0<68>9 is still built, and
that has some cost associated to it.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h joh!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hK,h j9h!hubhM)<29><>}<7D>(hXSimilarly, lazy evaluation can be beneficial for saving memory while keeping
code simple. Say you have a very large model ``Model`` derived from
:obj:`mlx.nn.Module`. You can instantiate this model with ``model = Model()``.
Typically, this will initialize all of the weights as ``float32``, but the
initialization does not actually compute anything until you perform an
:func:`eval`. If you update the model with ``float16`` weights, your maximum
consumed memory will be half that required if eager computation was used
instead.<2E>h]<5D>(h0<68>zSimilarly, lazy evaluation can be beneficial for saving memory while keeping
code simple. Say you have a very large model <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubh])<29><>}<7D>(h<05> ``Model``<60>h]<5D>h0<68>Model<65><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh\h j<>ubh0<68> derived from
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhW)<29><>}<7D>(h<05>:obj:`mlx.nn.Module`<60>h]<5D>h])<29><>}<7D>(hj<>h]<5D>h0<68> mlx.nn.Module<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hh<68>py<70><79>py-obj<62>eh]<5D>h]<5D>h]<5D>uhh\h j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hu<68> refdomain<69>j<EFBFBD><00>reftype<70><65>obj<62><6A> refexplicit<69><74><EFBFBD>refwarn<72><6E>h{h|h}Nh~<7E> mlx.nn.Module<6C>uhhVh"h#hK0h j<>ubh0<68>&. You can instantiate this model with <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubh])<29><>}<7D>(h<05>``model = Model()``<60>h]<5D>h0<68>model = Model()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh\h j<>ubh0<68>8.
Typically, this will initialize all of the weights as <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubh])<29><>}<7D>(h<05> ``float32``<60>h]<5D>h0<68>float32<33><32><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh\h j<>ubh0<68>Q, but the
initialization does not actually compute anything until you perform an
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhW)<29><>}<7D>(h<05> :func:`eval`<60>h]<5D>h])<29><>}<7D>(hjh]<5D>h0<68>eval()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubah}<7D>(h]<5D>h]<5D>(hh<68>py<70><79>py-func<6E>eh]<5D>h]<5D>h]<5D>uhh\h jubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hu<68> refdomain<69>j<00>reftype<70><65>func<6E><63> refexplicit<69><74><EFBFBD>refwarn<72><6E>h{h|h}Nh~<7E>eval<61>uhhVh"h#hK0h j<>ubh0<68>. If you update the model with <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubh])<29><>}<7D>(h<05> ``float16``<60>h]<5D>h0<68>float16<31><36><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j'h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh\h j<>ubh0<68>h weights, your maximum
consumed memory will be half that required if eager computation was used
instead.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hK0h j9h!hubhM)<29><>}<7D>(h<05>?This pattern is simple to do in MLX thanks to lazy computation:<3A>h]<5D>h0<68>?This pattern is simple to do in MLX thanks to lazy computation:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j?h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hK9h j9h!hubjY)<29><>}<7D>(h<05>Smodel = Model() # no memory used yet
model.load_weights("weights_fp16.safetensors")<29>h]<5D>h0<68>Smodel = Model() # no memory used yet
model.load_weights("weights_fp16.safetensors")<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>h jMsbah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>jhjijj<00>jk<00>python<6F>jm}<7D>uhjXh"h#hK;h j9h!hubeh}<7D>(h]<5D><>only-compute-what-you-use<73>ah]<5D>h]<5D><>only compute what you use<73>ah]<5D>h]<5D>uhh$h h;h!hh"h#hKubeh}<7D>(h]<5D><>why-lazy-evaluation<6F>ah]<5D>h]<5D><>why lazy evaluation<6F>ah]<5D>h]<5D>uhh$h h&h!hh"h#hK ubh%)<29><>}<7D>(hhh]<5D>(h*)<29><>}<7D>(h<05>When to Evaluate<74>h]<5D>h0<68>When to Evaluate<74><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jph!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh)h jmh!hh"h#hKAubhM)<29><>}<7D>(h<05><>A common question is when to use :func:`eval`. The trade-off is between
letting graphs get too large and not batching enough useful work.<2E>h]<5D>(h0<68>!A common question is when to use <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j~h!hh"NhNubhW)<29><>}<7D>(h<05> :func:`eval`<60>h]<5D>h])<29><>}<7D>(hj<>h]<5D>h0<68>eval()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hh<68>py<70><79>py-func<6E>eh]<5D>h]<5D>h]<5D>uhh\h j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hu<68> refdomain<69>j<EFBFBD><00>reftype<70><65>func<6E><63> refexplicit<69><74><EFBFBD>refwarn<72><6E>h{h|h}Nh~<7E>eval<61>uhhVh"h#hKCh j~ubh0<68>\. The trade-off is between
letting graphs get too large and not batching enough useful work.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j~h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hKCh jmh!hubhM)<29><>}<7D>(h<05> For example:<3A>h]<5D>h0<68> For example:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hKFh jmh!hubjY)<29><>}<7D>(h<05>Rfor _ in range(100):
a = a + b
mx.eval(a)
b = b * 2
mx.eval(b)<29>h]<5D>h0<68>Rfor _ in range(100):
a = a + b
mx.eval(a)
b = b * 2
mx.eval(b)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>h j<>sbah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>jhjijj<00>jk<00>python<6F>jm}<7D>uhjXh"h#hKHh jmh!hubhM)<29><>}<7D>(h<05><>This is a bad idea because there is some fixed overhead with each graph
evaluation. On the other hand, there is some slight overhead which grows with
the compute graph size, so extremely large graphs (while computationally
correct) can be costly.<2E>h]<5D>h0<68><30>This is a bad idea because there is some fixed overhead with each graph
evaluation. On the other hand, there is some slight overhead which grows with
the compute graph size, so extremely large graphs (while computationally
correct) can be costly.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hKPh jmh!hubhM)<29><>}<7D>(h<05><>Luckily, a wide range of compute graph sizes work pretty well with MLX:
anything from a few tens of operations to many thousands of operations per
evaluation should be okay.<2E>h]<5D>h0<68><30>Luckily, a wide range of compute graph sizes work pretty well with MLX:
anything from a few tens of operations to many thousands of operations per
evaluation should be okay.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hKUh jmh!hubhM)<29><>}<7D>(h<05><>Most numerical computations have an iterative outer loop (e.g. the iteration in
stochastic gradient descent). A natural and usually efficient place to use
:func:`eval` is at each iteration of this outer loop.<2E>h]<5D>(h0<68><30>Most numerical computations have an iterative outer loop (e.g. the iteration in
stochastic gradient descent). A natural and usually efficient place to use
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhW)<29><>}<7D>(h<05> :func:`eval`<60>h]<5D>h])<29><>}<7D>(hj<>h]<5D>h0<68>eval()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hh<68>py<70><79>py-func<6E>eh]<5D>h]<5D>h]<5D>uhh\h j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hu<68> refdomain<69>j<00>reftype<70><65>func<6E><63> refexplicit<69><74><EFBFBD>refwarn<72><6E>h{h|h}Nh~<7E>eval<61>uhhVh"h#hKYh j<>ubh0<68>) is at each iteration of this outer loop.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hKYh jmh!hubhM)<29><>}<7D>(h<05>Here is a concrete example:<3A>h]<5D>h0<68>Here is a concrete example:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hK]h jmh!hubjY)<29><>}<7D>(hXTfor batch in dataset:
# Nothing has been evaluated yet
loss, grad = value_and_grad_fn(model, batch)
# Still nothing has been evaluated
optimizer.update(model, grad)
# Evaluate the loss and the new parameters which will
# run the full gradient computation and optimizer update
mx.eval(loss, model.parameters())<29>h]<5D>h0XTfor batch in dataset:
# Nothing has been evaluated yet
loss, grad = value_and_grad_fn(model, batch)
# Still nothing has been evaluated
optimizer.update(model, grad)
# Evaluate the loss and the new parameters which will
# run the full gradient computation and optimizer update
mx.eval(loss, model.parameters())<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>h j*sbah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>jhjijj<00>jk<00>python<6F>jm}<7D>uhjXh"h#hK_h jmh!hubhM)<29><>}<7D>(hXPAn important behavior to be aware of is when the graph will be implicitly
evaluated. Anytime you ``print`` an array, convert it to an
:obj:`numpy.ndarray`, or otherwise access it's memory via :obj:`memoryview`,
the graph will be evaluated. Saving arrays via :func:`save` (or any other MLX
saving functions) will also evaluate the array.<2E>h]<5D>(h0<68>aAn important behavior to be aware of is when the graph will be implicitly
evaluated. Anytime you <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j:h!hh"NhNubh])<29><>}<7D>(h<05> ``print``<60>h]<5D>h0<68>print<6E><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jBh!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh\h j:ubh0<68> an array, convert it to an
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j:h!hh"NhNubhW)<29><>}<7D>(h<05>:obj:`numpy.ndarray`<60>h]<5D>h])<29><>}<7D>(hjVh]<5D>h0<68> numpy.ndarray<61><79><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jXh!hh"NhNubah}<7D>(h]<5D>h]<5D>(hh<68>py<70><79>py-obj<62>eh]<5D>h]<5D>h]<5D>uhh\h jTubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hu<68> refdomain<69>jb<00>reftype<70><65>obj<62><6A> refexplicit<69><74><EFBFBD>refwarn<72><6E>h{h|h}Nh~<7E> numpy.ndarray<61>uhhVh"h#hKnh j:ubh0<68>(, or otherwise access its memory via <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j:h!hh"NhNubhW)<29><>}<7D>(h<05>:obj:`memoryview`<60>h]<5D>h])<29><>}<7D>(hjzh]<5D>h0<68>
memoryview<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j|h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hh<68>py<70><79>py-obj<62>eh]<5D>h]<5D>h]<5D>uhh\h jxubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hu<68> refdomain<69>j<EFBFBD><00>reftype<70><65>obj<62><6A> refexplicit<69><74><EFBFBD>refwarn<72><6E>h{h|h}Nh~<7E>
memoryview<EFBFBD>uhhVh"h#hKnh j:ubh0<68>1,
the graph will be evaluated. Saving arrays via <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j:h!hh"NhNubhW)<29><>}<7D>(h<05> :func:`save`<60>h]<5D>h])<29><>}<7D>(hj<>h]<5D>h0<68>save()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hh<68>py<70><79>py-func<6E>eh]<5D>h]<5D>h]<5D>uhh\h j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hu<68> refdomain<69>j<EFBFBD><00>reftype<70><65>func<6E><63> refexplicit<69><74><EFBFBD>refwarn<72><6E>h{h|h}Nh~<7E>save<76>uhhVh"h#hKnh j:ubh0<68>B (or any other MLX
saving functions) will also evaluate the array.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j:h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hKnh jmh!hubhM)<29><>}<7D>(hXkCalling :func:`array.item` on a scalar array will also evaluate it. In the
example above, printing the loss (``print(loss)``) or adding the loss scalar to
a list (``losses.append(loss.item())``) would cause a graph evaluation. If
these lines are before ``mx.eval(loss, model.parameters())`` then this
will be a partial evaluation, computing only the forward pass.<2E>h]<5D>(h0<68>Calling <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubhW)<29><>}<7D>(h<05>:func:`array.item`<60>h]<5D>h])<29><>}<7D>(hj<>h]<5D>h0<68> array.item()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hh<68>py<70><79>py-func<6E>eh]<5D>h]<5D>h]<5D>uhh\h j<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hu<68> refdomain<69>j<EFBFBD><00>reftype<70><65>func<6E><63> refexplicit<69><74><EFBFBD>refwarn<72><6E>h{h|h}Nh~<7E>
array.item<65>uhhVh"h#hKuh j<>ubh0<68>S on a scalar array will also evaluate it. In the
example above, printing the loss (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubh])<29><>}<7D>(h<05>``print(loss)``<60>h]<5D>h0<68> print(loss)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh\h j<>ubh0<68>') or adding the loss scalar to
a list (<28><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubh])<29><>}<7D>(h<05>``losses.append(loss.item())``<60>h]<5D>h0<68>losses.append(loss.item())<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh\h j<>ubh0<68><) would cause a graph evaluation. If
these lines are before <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubh])<29><>}<7D>(h<05>%``mx.eval(loss, model.parameters())``<60>h]<5D>h0<68>!mx.eval(loss, model.parameters())<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jh!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhh\h j<>ubh0<68>I then this
will be a partial evaluation, computing only the forward pass.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hKuh jmh!hubhM)<29><>}<7D>(h<05>vAlso, calling :func:`eval` on an array or set of arrays multiple times is
perfectly fine. This is effectively a no-op.<2E>h]<5D>(h0<68>Also, calling <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j.h!hh"NhNubhW)<29><>}<7D>(h<05> :func:`eval`<60>h]<5D>h])<29><>}<7D>(hj8h]<5D>h0<68>eval()<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j:h!hh"NhNubah}<7D>(h]<5D>h]<5D>(hh<68>py<70><79>py-func<6E>eh]<5D>h]<5D>h]<5D>uhh\h j6ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>refdoc<6F>hu<68> refdomain<69>jD<00>reftype<70><65>func<6E><63> refexplicit<69><74><EFBFBD>refwarn<72><6E>h{h|h}Nh~<7E>eval<61>uhhVh"h#hK{h j.ubh0<68>\ on an array or set of arrays multiple times is
perfectly fine. This is effectively a no-op.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j.h!hh"NhNubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hK{h jmh!hubh <09>warning<6E><67><EFBFBD>)<29><>}<7D>(h<05>>Using scalar arrays for control-flow will cause an evaluation.<2E>h]<5D>hM)<29><>}<7D>(hjdh]<5D>h0<68>>Using scalar arrays for control-flow will cause an evaluation.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jfh!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hK<>h jbubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhj`h jmh!hh"h#hNubhM)<29><>}<7D>(h<05>Here is an example:<3A>h]<5D>h0<68>Here is an example:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h jyh!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hK<>h jmh!hubjY)<29><>}<7D>(h<05><>def fun(x):
h, y = first_layer(x)
if y > 0: # An evaluation is done here!
z = second_layer_a(h)
else:
z = second_layer_b(h)
return z<>h]<5D>h0<68><30>def fun(x):
h, y = first_layer(x)
if y > 0: # An evaluation is done here!
z = second_layer_a(h)
else:
z = second_layer_b(h)
return z<><7A><EFBFBD><EFBFBD><EFBFBD>}<7D>h j<>sbah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>jhjijj<00>jk<00>python<6F>jm}<7D>uhjXh"h#hK<>h jmh!hubhM)<29><>}<7D>(h<05><>Using arrays for control flow should be done with care. The above example works
and can even be used with gradient transformations. However, this can be very
inefficient if evaluations are done too frequently.<2E>h]<5D>h0<68><30>Using arrays for control flow should be done with care. The above example works
and can even be used with gradient transformations. However, this can be very
inefficient if evaluations are done too frequently.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h j<>h!hh"NhNubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh"h#hK<>h jmh!hubeh}<7D>(h]<5D><>when-to-evaluate<74>ah]<5D>h]<5D><>when to evaluate<74>ah]<5D>h]<5D>uhh$h h&h!hh"h#hKAubeh}<7D>(h]<5D>(<28>lazy-evaluation<6F>heh]<5D>h]<5D>(<28>lazy evaluation<6F><6E> lazy eval<61>eh]<5D>h]<5D>uhh$h hh!hh"h#hK<04>expect_referenced_by_name<6D>}<7D>j<EFBFBD>h s<>expect_referenced_by_id<69>}<7D>hh subeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>source<63>h#uhh<01>current_source<63>N<EFBFBD> current_line<6E>N<EFBFBD>settings<67><73>docutils.frontend<6E><64>Values<65><73><EFBFBD>)<29><>}<7D>(h)N<> generator<6F>N<EFBFBD> datestamp<6D>N<EFBFBD> source_link<6E>N<EFBFBD>
source_url<EFBFBD>N<EFBFBD> toc_backlinks<6B><73>entry<72><79>footnote_backlinks<6B>K<01> sectnum_xform<72>K<01>strip_comments<74>N<EFBFBD>strip_elements_with_classes<65>N<EFBFBD> strip_classes<65>N<EFBFBD> report_level<65>K<02>
halt_level<EFBFBD>K<05>exit_status_level<65>K<05>debug<75>N<EFBFBD>warning_stream<61>N<EFBFBD> traceback<63><6B><EFBFBD>input_encoding<6E><67> utf-8-sig<69><67>input_encoding_error_handler<65><72>strict<63><74>output_encoding<6E><67>utf-8<><38>output_encoding_error_handler<65>j<EFBFBD><00>error_encoding<6E><67>utf-8<><38>error_encoding_error_handler<65><72>backslashreplace<63><65> language_code<64><65>en<65><6E>record_dependencies<65>N<EFBFBD>config<69>N<EFBFBD> id_prefix<69>h<06>auto_id_prefix<69><78>id<69><64> dump_settings<67>N<EFBFBD>dump_internals<6C>N<EFBFBD>dump_transforms<6D>N<EFBFBD>dump_pseudo_xml<6D>N<EFBFBD>expose_internals<6C>N<EFBFBD>strict_visitor<6F>N<EFBFBD>_disable_config<69>N<EFBFBD>_source<63>h#<23> _destination<6F>N<EFBFBD> _config_files<65>]<5D><>file_insertion_enabled<65><64><EFBFBD> raw_enabled<65>K<01>line_length_limit<69>M'<27>pep_references<65>N<EFBFBD> pep_base_url<72><6C>https://peps.python.org/<2F><>pep_file_url_template<74><65>pep-%04d<34><64>rfc_references<65>N<EFBFBD> rfc_base_url<72><6C>&https://datatracker.ietf.org/doc/html/<2F><> tab_width<74>K<08>trim_footnote_reference_space<63><65><EFBFBD>syntax_highlight<68><74>long<6E><67> smart_quotes<65><73><EFBFBD>smartquotes_locales<65>]<5D><>character_level_inline_markup<75><70><EFBFBD>doctitle_xform<72><6D><EFBFBD> docinfo_xform<72>K<01>sectsubtitle_xform<72><6D><EFBFBD> image_loading<6E><67>link<6E><6B>embed_stylesheet<65><74><EFBFBD>cloak_email_addresses<65><73><EFBFBD>section_self_link<6E><6B><EFBFBD>env<6E>Nub<75>reporter<65>N<EFBFBD>indirect_targets<74>]<5D><>substitution_defs<66>}<7D><>substitution_names<65>}<7D><>refnames<65>}<7D><>refids<64>}<7D>h]<5D>h as<61>nameids<64>}<7D>(j<>hj<>j<>jjjgj6j3jbj_j<>j<>u<> nametypes<65>}<7D>(j<><00>j<EFBFBD><00>jj<00>j6<00>jb<00>j<EFBFBD><00>uh}<7D>(hh&j<>h&jgh;j3h<>j_j9j<>jmu<> footnote_refs<66>}<7D><> citation_refs<66>}<7D><> autofootnotes<65>]<5D><>autofootnote_refs<66>]<5D><>symbol_footnotes<65>]<5D><>symbol_footnote_refs<66>]<5D><> footnotes<65>]<5D><> citations<6E>]<5D><>autofootnote_start<72>K<01>symbol_footnote_start<72>K<00>
id_counter<EFBFBD><EFBFBD> collections<6E><73>Counter<65><72><EFBFBD>}<7D><><EFBFBD>R<EFBFBD><52>parse_messages<65>]<5D><>transform_messages<65>]<5D>h <09>system_message<67><65><EFBFBD>)<29><>}<7D>(hhh]<5D>hM)<29><>}<7D>(hhh]<5D>h0<68>/Hyperlink target "lazy-eval" is not referenced.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>h jFsbah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D>uhhLh jCubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h]<5D><>level<65>K<01>type<70><65>INFO<46><4F>source<63>h#<23>line<6E>KuhjAuba<62> transformer<65>N<EFBFBD> include_log<6F>]<5D><>
decoration<EFBFBD>Nh!hub.