mirror of
https://github.com/ml-explore/mlx.git
synced 2025-09-17 17:28:10 +08:00
docs up
This commit is contained in:

committed by
CircleCI Docs

parent
d03b91923e
commit
d67cd9230c
103
docs/build/html/_sources/usage/numpy.rst
vendored
Normal file
103
docs/build/html/_sources/usage/numpy.rst
vendored
Normal file
@@ -0,0 +1,103 @@
|
||||
.. _numpy:
|
||||
|
||||
Conversion to NumPy and Other Frameworks
|
||||
========================================
|
||||
|
||||
MLX array implements the `Python Buffer Protocol <https://docs.python.org/3/c-api/buffer.html>`_.
|
||||
Let's convert an array to NumPy and back.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import mlx.core as mx
|
||||
import numpy as np
|
||||
|
||||
a = mx.arange(3)
|
||||
b = np.array(a) # copy of a
|
||||
c = mx.array(b) # copy of b
|
||||
|
||||
.. note::
|
||||
|
||||
Since NumPy does not support ``bfloat16`` arrays, you will need to convert to ``float16`` or ``float32`` first:
|
||||
``np.array(a.astype(mx.float32))``.
|
||||
Otherwise, you will receive an error like: ``Item size 2 for PEP 3118 buffer format string does not match the dtype V item size 0.``
|
||||
|
||||
By default, NumPy copies data to a new array. This can be prevented by creating an array view:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
a = mx.arange(3)
|
||||
a_view = np.array(a, copy=False)
|
||||
print(a_view.flags.owndata) # False
|
||||
a_view[0] = 1
|
||||
print(a[0].item()) # 1
|
||||
|
||||
A NumPy array view is a normal NumPy array, except that it does not own its memory.
|
||||
This means writing to the view is reflected in the original array.
|
||||
|
||||
While this is quite powerful to prevent copying arrays, it should be noted that external changes to the memory of arrays cannot be reflected in gradients.
|
||||
|
||||
Let's demonstrate this in an example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def f(x):
|
||||
x_view = np.array(x, copy=False)
|
||||
x_view[:] *= x_view # modify memory without telling mx
|
||||
return x.sum()
|
||||
|
||||
x = mx.array([3.0])
|
||||
y, df = mx.value_and_grad(f)(x)
|
||||
print("f(x) = x² =", y.item()) # 9.0
|
||||
print("f'(x) = 2x !=", df.item()) # 1.0
|
||||
|
||||
|
||||
The function ``f`` indirectly modifies the array ``x`` through a memory view.
|
||||
However, this modification is not reflected in the gradient, as seen in the last line outputting ``1.0``,
|
||||
representing the gradient of the sum operation alone.
|
||||
The squaring of ``x`` occurs externally to MLX, meaning that no gradient is incorporated.
|
||||
It's important to note that a similar issue arises during array conversion and copying.
|
||||
For instance, a function defined as ``mx.array(np.array(x)**2).sum()`` would also result in an incorrect gradient,
|
||||
even though no in-place operations on MLX memory are executed.
|
||||
|
||||
PyTorch
|
||||
-------
|
||||
|
||||
PyTorch supports the buffer protocol, but it requires an explicit :obj:`memoryview`.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import mlx.core as mx
|
||||
import torch
|
||||
|
||||
a = mx.arange(3)
|
||||
b = torch.tensor(memoryview(a))
|
||||
c = mx.array(b.numpy())
|
||||
|
||||
Conversion from PyTorch tensors back to arrays must be done via intermediate NumPy arrays with ``numpy()``.
|
||||
|
||||
JAX
|
||||
---
|
||||
JAX fully supports the buffer protocol.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import mlx.core as mx
|
||||
import jax.numpy as jnp
|
||||
|
||||
a = mx.arange(3)
|
||||
b = jnp.array(a)
|
||||
c = mx.array(b)
|
||||
|
||||
TensorFlow
|
||||
----------
|
||||
|
||||
TensorFlow supports the buffer protocol, but it requires an explicit :obj:`memoryview`.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import mlx.core as mx
|
||||
import tensorflow as tf
|
||||
|
||||
a = mx.arange(3)
|
||||
b = tf.constant(memoryview(a))
|
||||
c = mx.array(b)
|
64
docs/build/html/_sources/usage/quick_start.rst
vendored
Normal file
64
docs/build/html/_sources/usage/quick_start.rst
vendored
Normal file
@@ -0,0 +1,64 @@
|
||||
Quick Start Guide
|
||||
=================
|
||||
|
||||
|
||||
Basics
|
||||
------
|
||||
|
||||
.. currentmodule:: mlx.core
|
||||
|
||||
Import ``mlx.core`` and make an :class:`array`:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
>> import mlx.core as mx
|
||||
>> a = mx.array([1, 2, 3, 4])
|
||||
>> a.shape
|
||||
[4]
|
||||
>> a.dtype
|
||||
int32
|
||||
>> b = mx.array([1.0, 2.0, 3.0, 4.0])
|
||||
>> b.dtype
|
||||
float32
|
||||
|
||||
Operations in MLX are lazy. The outputs of MLX operations are not computed
|
||||
until they are needed. To force an array to be evaluated use
|
||||
:func:`eval`. Arrays will automatically be evaluated in a few cases. For
|
||||
example, inspecting a scalar with :meth:`array.item`, printing an array,
|
||||
or converting an array from :class:`array` to :class:`numpy.ndarray` all
|
||||
automatically evaluate the array.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
>> c = a + b # c not yet evaluated
|
||||
>> mx.eval(c) # evaluates c
|
||||
>> c = a + b
|
||||
>> print(c) # Also evaluates c
|
||||
array([2, 4, 6, 8], dtype=float32)
|
||||
>> c = a + b
|
||||
>> import numpy as np
|
||||
>> np.array(c) # Also evaluates c
|
||||
array([2., 4., 6., 8.], dtype=float32)
|
||||
|
||||
Function and Graph Transformations
|
||||
----------------------------------
|
||||
|
||||
MLX has standard function transformations like :func:`grad` and :func:`vmap`.
|
||||
Transformations can be composed arbitrarily. For example
|
||||
``grad(vmap(grad(fn)))`` (or any other composition) is allowed.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
>> x = mx.array(0.0)
|
||||
>> mx.sin(x)
|
||||
array(0, dtype=float32)
|
||||
>> mx.grad(mx.sin)(x)
|
||||
array(1, dtype=float32)
|
||||
>> mx.grad(mx.grad(mx.sin))(x)
|
||||
array(-0, dtype=float32)
|
||||
|
||||
Other gradient transformations include :func:`vjp` for vector-Jacobian products
|
||||
and :func:`jvp` for Jacobian-vector products.
|
||||
|
||||
Use :func:`value_and_grad` to efficiently compute both a function's output and
|
||||
gradient with respect to the function's input.
|
78
docs/build/html/_sources/usage/unified_memory.rst
vendored
Normal file
78
docs/build/html/_sources/usage/unified_memory.rst
vendored
Normal file
@@ -0,0 +1,78 @@
|
||||
.. _unified_memory:
|
||||
|
||||
Unified Memory
|
||||
==============
|
||||
|
||||
.. currentmodule:: mlx.core
|
||||
|
||||
Apple silicon has a unified memory architecture. The CPU and GPU have direct
|
||||
access to the same memory pool. MLX is designed to take advantage of that.
|
||||
|
||||
Concretely, when you make an array in MLX you don't have to specify its location:
|
||||
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
a = mx.random.normal((100,))
|
||||
b = mx.random.normal((100,))
|
||||
|
||||
Both ``a`` and ``b`` live in unified memory.
|
||||
|
||||
In MLX, rather than moving arrays to devices, you specify the device when you
|
||||
run the operation. Any device can perform any operation on ``a`` and ``b``
|
||||
without needing to move them from one memory location to another. For example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
mx.add(a, b, stream=mx.cpu)
|
||||
mx.add(a, b, stream=mx.gpu)
|
||||
|
||||
In the above, both the CPU and the GPU will perform the same add
|
||||
operation. The operations can (and likely will) be run in parallel since
|
||||
there are no dependencies between them. See :ref:`using_streams` for more
|
||||
information the semantics of streams in MLX.
|
||||
|
||||
In the above ``add`` example, there are no dependencies between operations, so
|
||||
there is no possibility for race conditions. If there are dependencies, the
|
||||
MLX scheduler will automatically manage them. For example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
c = mx.add(a, b, stream=mx.cpu)
|
||||
d = mx.add(a, c, stream=mx.gpu)
|
||||
|
||||
In the above case, the second ``add`` runs on the GPU but it depends on the
|
||||
output of the first ``add`` which is running on the CPU. MLX will
|
||||
automatically insert a dependency between the two streams so that the second
|
||||
``add`` only starts executing after the first is complete and ``c`` is
|
||||
available.
|
||||
|
||||
A Simple Example
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
Here is a more interesting (albeit slightly contrived example) of how unified
|
||||
memory can be helpful. Suppose we have the following computation:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def fun(a, b, d1, d2):
|
||||
x = mx.matmul(a, b, stream=d1)
|
||||
for _ in range(500):
|
||||
b = mx.exp(b, stream=d2)
|
||||
return x, b
|
||||
|
||||
which we want to run with the following arguments:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
a = mx.random.uniform(shape=(4096, 512))
|
||||
b = mx.random.uniform(shape=(512, 4))
|
||||
|
||||
The first ``matmul`` operation is a good fit for the GPU since it's more
|
||||
compute dense. The second sequence of operations are a better fit for the CPU,
|
||||
since they are very small and would probably be overhead bound on the GPU.
|
||||
|
||||
If we time the computation fully on the GPU, we get 2.8 milliseconds. But if we
|
||||
run the computation with ``d1=mx.gpu`` and ``d2=mx.cpu``, then the time is only
|
||||
about 1.4 milliseconds, about twice as fast. These times were measured on an M1
|
||||
Max.
|
18
docs/build/html/_sources/usage/using_streams.rst
vendored
Normal file
18
docs/build/html/_sources/usage/using_streams.rst
vendored
Normal file
@@ -0,0 +1,18 @@
|
||||
.. _using_streams:
|
||||
|
||||
Using Streams
|
||||
=============
|
||||
|
||||
.. currentmodule:: mlx.core
|
||||
|
||||
Specifying the :obj:`Stream`
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
All operations (including random number generation) take an optional
|
||||
keyword argument ``stream``. The ``stream`` kwarg specifies which
|
||||
:obj:`Stream` the operation should run on. If the stream is unspecified then
|
||||
the operation is run on the default stream of the default device:
|
||||
``mx.default_stream(mx.default_device())``. The ``stream`` kwarg can also
|
||||
be a :obj:`Device` (e.g. ``stream=my_device``) in which case the operation is
|
||||
run on the default stream of the provided device
|
||||
``mx.default_stream(my_device)``.
|
Reference in New Issue
Block a user