mirror of
https://github.com/ml-explore/mlx.git
synced 2025-11-04 18:48:15 +08:00
docs update
This commit is contained in:
committed by
CircleCI Docs
parent
cc06c8bc0e
commit
03a66f24b0
1
docs/build/html/_sources/index.rst
vendored
1
docs/build/html/_sources/index.rst
vendored
@@ -40,6 +40,7 @@ are the CPU and GPU.
|
||||
usage/unified_memory
|
||||
usage/indexing
|
||||
usage/saving_and_loading
|
||||
usage/function_transforms
|
||||
usage/numpy
|
||||
usage/using_streams
|
||||
|
||||
|
||||
23
docs/build/html/_sources/install.rst
vendored
23
docs/build/html/_sources/install.rst
vendored
@@ -1,8 +1,8 @@
|
||||
Build and Install
|
||||
=================
|
||||
|
||||
Install from PyPI
|
||||
-----------------
|
||||
Python Installation
|
||||
-------------------
|
||||
|
||||
MLX is available on PyPI. All you have to do to use MLX with your own Apple
|
||||
silicon computer is
|
||||
@@ -21,6 +21,14 @@ To install from PyPI you must meet the following requirements:
|
||||
MLX is only available on devices running macOS >= 13.3
|
||||
It is highly recommended to use macOS 14 (Sonoma)
|
||||
|
||||
|
||||
MLX is also available on conda-forge. To install MLX with conda do:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
conda install conda-forge::mlx
|
||||
|
||||
|
||||
Troubleshooting
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
@@ -213,3 +221,14 @@ Verify the terminal is now running natively the following command:
|
||||
|
||||
$ uname -p
|
||||
arm
|
||||
|
||||
Also check that cmake is using the correct architecture:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cmake --system-information | grep CMAKE_HOST_SYSTEM_PROCESSOR
|
||||
CMAKE_HOST_SYSTEM_PROCESSOR "arm64"
|
||||
|
||||
If you see ``"x86_64"``, try re-installing ``cmake``. If you see ``"arm64"``
|
||||
but the build errors out with "Building for x86_64 on macOS is not supported."
|
||||
wipe your build cahce with ``rm -rf build/`` and try again.
|
||||
|
||||
6
docs/build/html/_sources/python/_autosummary/mlx.core.isinf.rst
vendored
Normal file
6
docs/build/html/_sources/python/_autosummary/mlx.core.isinf.rst
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
mlx.core.isinf
|
||||
==============
|
||||
|
||||
.. currentmodule:: mlx.core
|
||||
|
||||
.. autofunction:: isinf
|
||||
6
docs/build/html/_sources/python/_autosummary/mlx.core.isnan.rst
vendored
Normal file
6
docs/build/html/_sources/python/_autosummary/mlx.core.isnan.rst
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
mlx.core.isnan
|
||||
==============
|
||||
|
||||
.. currentmodule:: mlx.core
|
||||
|
||||
.. autofunction:: isnan
|
||||
6
docs/build/html/_sources/python/_autosummary/mlx.core.isneginf.rst
vendored
Normal file
6
docs/build/html/_sources/python/_autosummary/mlx.core.isneginf.rst
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
mlx.core.isneginf
|
||||
=================
|
||||
|
||||
.. currentmodule:: mlx.core
|
||||
|
||||
.. autofunction:: isneginf
|
||||
6
docs/build/html/_sources/python/_autosummary/mlx.core.isposinf.rst
vendored
Normal file
6
docs/build/html/_sources/python/_autosummary/mlx.core.isposinf.rst
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
mlx.core.isposinf
|
||||
=================
|
||||
|
||||
.. currentmodule:: mlx.core
|
||||
|
||||
.. autofunction:: isposinf
|
||||
@@ -15,9 +15,9 @@ simple functions.
|
||||
gelu
|
||||
gelu_approx
|
||||
gelu_fast_approx
|
||||
relu
|
||||
mish
|
||||
prelu
|
||||
relu
|
||||
selu
|
||||
silu
|
||||
step
|
||||
selu
|
||||
mish
|
||||
|
||||
36
docs/build/html/_sources/python/nn/layers.rst
vendored
36
docs/build/html/_sources/python/nn/layers.rst
vendored
@@ -9,29 +9,29 @@ Layers
|
||||
:toctree: _autosummary
|
||||
:template: nn-module-template.rst
|
||||
|
||||
Sequential
|
||||
ReLU
|
||||
PReLU
|
||||
GELU
|
||||
SiLU
|
||||
Step
|
||||
SELU
|
||||
Mish
|
||||
Embedding
|
||||
Linear
|
||||
QuantizedLinear
|
||||
ALiBi
|
||||
BatchNorm
|
||||
Conv1d
|
||||
Conv2d
|
||||
BatchNorm
|
||||
LayerNorm
|
||||
RMSNorm
|
||||
GroupNorm
|
||||
InstanceNorm
|
||||
Dropout
|
||||
Dropout2d
|
||||
Dropout3d
|
||||
Transformer
|
||||
Embedding
|
||||
GELU
|
||||
GroupNorm
|
||||
InstanceNorm
|
||||
LayerNorm
|
||||
Linear
|
||||
Mish
|
||||
MultiHeadAttention
|
||||
ALiBi
|
||||
PReLU
|
||||
QuantizedLinear
|
||||
RMSNorm
|
||||
ReLU
|
||||
RoPE
|
||||
SELU
|
||||
Sequential
|
||||
SiLU
|
||||
SinusoidalPositionalEncoding
|
||||
Step
|
||||
Transformer
|
||||
|
||||
10
docs/build/html/_sources/python/nn/losses.rst
vendored
10
docs/build/html/_sources/python/nn/losses.rst
vendored
@@ -10,14 +10,14 @@ Loss Functions
|
||||
:template: nn-module-template.rst
|
||||
|
||||
binary_cross_entropy
|
||||
cosine_similarity_loss
|
||||
cross_entropy
|
||||
hinge_loss
|
||||
huber_loss
|
||||
kl_div_loss
|
||||
l1_loss
|
||||
log_cosh_loss
|
||||
mse_loss
|
||||
nll_loss
|
||||
smooth_l1_loss
|
||||
triplet_loss
|
||||
hinge_loss
|
||||
huber_loss
|
||||
log_cosh_loss
|
||||
cosine_similarity_loss
|
||||
triplet_loss
|
||||
4
docs/build/html/_sources/python/ops.rst
vendored
4
docs/build/html/_sources/python/ops.rst
vendored
@@ -51,6 +51,10 @@ Operations
|
||||
greater_equal
|
||||
identity
|
||||
inner
|
||||
isnan
|
||||
isposinf
|
||||
isneginf
|
||||
isinf
|
||||
less
|
||||
less_equal
|
||||
linspace
|
||||
|
||||
8
docs/build/html/_sources/python/random.rst
vendored
8
docs/build/html/_sources/python/random.rst
vendored
@@ -33,13 +33,13 @@ we use a splittable version of Threefry, which is a counter-based PRNG.
|
||||
.. autosummary::
|
||||
:toctree: _autosummary
|
||||
|
||||
seed
|
||||
key
|
||||
split
|
||||
bernoulli
|
||||
categorical
|
||||
gumbel
|
||||
key
|
||||
normal
|
||||
randint
|
||||
uniform
|
||||
seed
|
||||
split
|
||||
truncated_normal
|
||||
uniform
|
||||
|
||||
188
docs/build/html/_sources/usage/function_transforms.rst
vendored
Normal file
188
docs/build/html/_sources/usage/function_transforms.rst
vendored
Normal file
@@ -0,0 +1,188 @@
|
||||
.. _function_transforms:
|
||||
|
||||
Function Transforms
|
||||
===================
|
||||
|
||||
.. currentmodule:: mlx.core
|
||||
|
||||
MLX uses composable function transformations for automatic differentiation and
|
||||
vectorization. The key idea behind composable function transformations is that
|
||||
every transformation returns a function which can be further transformed.
|
||||
|
||||
Here is a simple example:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
>>> dfdx = mx.grad(mx.sin)
|
||||
>>> dfdx(mx.array(mx.pi))
|
||||
array(-1, dtype=float32)
|
||||
>>> mx.cos(mx.array(mx.pi))
|
||||
array(-1, dtype=float32)
|
||||
|
||||
|
||||
The output of :func:`grad` on :func:`sin` is simply another function. In this
|
||||
case it is the gradient of the sine function which is exactly the cosine
|
||||
function. To get the second derivative you can do:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
>>> d2fdx2 = mx.grad(mx.grad(mx.sin))
|
||||
>>> d2fdx2(mx.array(mx.pi / 2))
|
||||
array(-1, dtype=float32)
|
||||
>>> mx.sin(mx.array(mx.pi / 2))
|
||||
array(1, dtype=float32)
|
||||
|
||||
Using :func:`grad` on the output of :func:`grad` is always ok. You keep
|
||||
getting higher order derivatives.
|
||||
|
||||
Any of the MLX function transformations can be composed in any order to any
|
||||
depth. To see the complete list of function transformations check-out the
|
||||
:ref:`API documentation <transforms>`. See the following sections for more
|
||||
information on :ref:`automatic differentiaion <auto diff>` and
|
||||
:ref:`automatic vectorization <vmap>`.
|
||||
|
||||
Automatic Differentiation
|
||||
-------------------------
|
||||
|
||||
.. _auto diff:
|
||||
|
||||
Automatic differentiation in MLX works on functions rather than on implicit
|
||||
graphs.
|
||||
|
||||
.. note::
|
||||
|
||||
If you are coming to MLX from PyTorch, you no longer need functions like
|
||||
``backward``, ``zero_grad``, and ``detach``, or properties like
|
||||
``requires_grad``.
|
||||
|
||||
The most basic example is taking the gradient of a scalar-valued function as we
|
||||
saw above. You can use the :func:`grad` and :func:`value_and_grad` function to
|
||||
compute gradients of more complex functions. By default these functions compute
|
||||
the gradient with respect to the first argument:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def loss_fn(w, x, y):
|
||||
return mx.mean(mx.square(w * x - y))
|
||||
|
||||
w = mx.array(1.0)
|
||||
x = mx.array([0.5, -0.5])
|
||||
y = mx.array([1.5, -1.5])
|
||||
|
||||
# Computes the gradient of loss_fn with respect to w:
|
||||
grad_fn = mx.grad(loss_fn)
|
||||
dloss_dw = grad_fn(w, x, y)
|
||||
# Prints array(-1, dtype=float32)
|
||||
print(dloss_dw)
|
||||
|
||||
# To get the gradient with respect to x we can do:
|
||||
grad_fn = mx.grad(loss_fn, argnums=1)
|
||||
dloss_dx = grad_fn(w, x, y)
|
||||
# Prints array([-1, 1], dtype=float32)
|
||||
print(dloss_dx)
|
||||
|
||||
|
||||
One way to get the loss and gradient is to call ``loss_fn`` followed by
|
||||
``grad_fn``, but this can result in a lot of redundant work. Instead, you
|
||||
should use :func:`value_and_grad`. Continuing the above example:
|
||||
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Computes the gradient of loss_fn with respect to w:
|
||||
loss_and_grad_fn = mx.value_and_grad(loss_fn)
|
||||
loss, dloss_dw = loss_and_grad_fn(w, x, y)
|
||||
|
||||
# Prints array(1, dtype=float32)
|
||||
print(loss)
|
||||
|
||||
# Prints array(-1, dtype=float32)
|
||||
print(dloss_dw)
|
||||
|
||||
|
||||
You can also take the gradient with respect to arbitrarily nested Python
|
||||
containers of arrays (specifically any of :obj:`list`, :obj:`tuple`, or
|
||||
:obj:`dict`).
|
||||
|
||||
Suppose we wanted a weight and a bias parameter in the above example. A nice
|
||||
way to do that is the following:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def loss_fn(params, x, y):
|
||||
w, b = params["weight"], params["bias"]
|
||||
h = w * x + b
|
||||
return mx.mean(mx.square(h - y))
|
||||
|
||||
params = {"weight": mx.array(1.0), "bias": mx.array(0.0)}
|
||||
x = mx.array([0.5, -0.5])
|
||||
y = mx.array([1.5, -1.5])
|
||||
|
||||
# Computes the gradient of loss_fn with respect to both the
|
||||
# weight and bias:
|
||||
grad_fn = mx.grad(loss_fn)
|
||||
grads = grad_fn(params, x, y)
|
||||
|
||||
# Prints
|
||||
# {'weight': array(-1, dtype=float32), 'bias': array(0, dtype=float32)}
|
||||
print(grads)
|
||||
|
||||
Notice the tree structure of the parameters is preserved in the gradients.
|
||||
|
||||
In some cases you may want to stop gradients from propagating through a
|
||||
part of the function. You can use the :func:`stop_gradient` for that.
|
||||
|
||||
|
||||
Automatic Vectorization
|
||||
-----------------------
|
||||
|
||||
.. _vmap:
|
||||
|
||||
Use :func:`vmap` to automate vectorizing complex functions. Here we'll go
|
||||
through a basic and contrived example for the sake of clarity, but :func:`vmap`
|
||||
can be quite powerful for more complex functions which are difficult to optimize
|
||||
by hand.
|
||||
|
||||
.. warning::
|
||||
|
||||
Some operations are not yet supported with :func:`vmap`. If you encounter an error
|
||||
like: ``ValueError: Primitive's vmap not implemented.`` file an `issue
|
||||
<https://github.com/ml-explore/mlx/issues>`_ and include your function.
|
||||
We will prioritize including it.
|
||||
|
||||
A naive way to add the elements from two sets of vectors is with a loop:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
xs = mx.random.uniform(shape=(4096, 100))
|
||||
ys = mx.random.uniform(shape=(100, 4096))
|
||||
|
||||
def naive_add(xs, ys):
|
||||
return [xs[i] + ys[:, i] for i in range(xs.shape[1])]
|
||||
|
||||
Instead you can use :func:`vmap` to automatically vectorize the addition:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Vectorize over the second dimension of x and the
|
||||
# first dimension of y
|
||||
vmap_add = mx.vmap(lambda x, y: x + y, in_axes=(1, 0))
|
||||
|
||||
The ``in_axes`` parameter can be used to specify which dimensions of the
|
||||
corresponding input to vectorize over. Similarly, use ``out_axes`` to specify
|
||||
where the vectorized axes should be in the outputs.
|
||||
|
||||
Let's time these two different versions:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import timeit
|
||||
|
||||
print(timeit.timeit(lambda: mx.eval(naive_add(xs, ys)), number=100))
|
||||
print(timeit.timeit(lambda: mx.eval(vmap_add(xs, ys)), number=100))
|
||||
|
||||
On an M1 Max the naive version takes in total ``0.390`` seconds whereas the
|
||||
vectorized version takes only ``0.025`` seconds, more than ten times faster.
|
||||
|
||||
Of course, this operation is quite contrived. A better approach is to simply do
|
||||
``xs + ys.T``, but for more complex functions :func:`vmap` can be quite handy.
|
||||
Reference in New Issue
Block a user