docs up

2025-12-16 01:49:05 +08:00 · 2024-01-06 05:41:48 -08:00
parent d03b91923e
commit d67cd9230c
299 changed files with 4436 additions and 5531 deletions
--- a/docs/build/html/_sources/usage/numpy.rst
+++ b/docs/build/html/_sources/usage/numpy.rst
@@ -0,0 +1,103 @@
+.. _numpy:
+
+Conversion to NumPy and Other Frameworks
+========================================
+
+MLX array implements the `Python Buffer Protocol <https://docs.python.org/3/c-api/buffer.html>`_.
+Let's convert an array to NumPy and back.
+
+.. code-block:: python
+
+  import mlx.core as mx
+  import numpy as np
+
+  a = mx.arange(3)
+  b = np.array(a) # copy of a
+  c = mx.array(b) # copy of b
+
+.. note::
+
+    Since NumPy does not support ``bfloat16`` arrays, you will need to convert to ``float16`` or ``float32`` first:
+    ``np.array(a.astype(mx.float32))``.
+    Otherwise, you will receive an error like: ``Item size 2 for PEP 3118 buffer format string does not match the dtype V item size 0.``
+
+By default, NumPy copies data to a new array. This can be prevented by creating an array view:
+
+.. code-block:: python
+
+  a = mx.arange(3)
+  a_view = np.array(a, copy=False)
+  print(a_view.flags.owndata) # False
+  a_view[0] = 1
+  print(a[0].item()) # 1
+
+A NumPy array view is a normal NumPy array, except that it does not own its memory.
+This means writing to the view is reflected in the original array.
+
+While this is quite powerful to prevent copying arrays, it should be noted that external changes to the memory of arrays cannot be reflected in gradients.
+
+Let's demonstrate this in an example:
+
+.. code-block:: python
+
+  def f(x):
+      x_view = np.array(x, copy=False)
+      x_view[:] *= x_view # modify memory without telling mx
+      return x.sum()
+
+  x = mx.array([3.0])
+  y, df = mx.value_and_grad(f)(x)
+  print("f(x) = x² =", y.item()) # 9.0
+  print("f'(x) = 2x !=", df.item()) # 1.0
+
+
+The function ``f`` indirectly modifies the array ``x`` through a memory view.
+However, this modification is not reflected in the gradient, as seen in the last line outputting ``1.0``,
+representing the gradient of the sum operation alone.
+The squaring of ``x`` occurs externally to MLX, meaning that no gradient is incorporated.
+It's important to note that a similar issue arises during array conversion and copying.
+For instance, a function defined as ``mx.array(np.array(x)**2).sum()`` would also result in an incorrect gradient,
+even though no in-place operations on MLX memory are executed.
+
+PyTorch
+-------
+
+PyTorch supports the buffer protocol, but it requires an explicit :obj:`memoryview`.
+
+.. code-block:: python
+
+  import mlx.core as mx
+  import torch
+
+  a = mx.arange(3)
+  b = torch.tensor(memoryview(a))
+  c = mx.array(b.numpy())
+
+Conversion from PyTorch tensors back to arrays must be done via intermediate NumPy arrays with ``numpy()``.
+
+JAX
+---
+JAX fully supports the buffer protocol.
+
+.. code-block:: python
+
+  import mlx.core as mx
+  import jax.numpy as jnp
+
+  a = mx.arange(3)
+  b = jnp.array(a)
+  c = mx.array(b)
+
+TensorFlow
+----------
+
+TensorFlow supports the buffer protocol, but it requires an explicit :obj:`memoryview`.
+
+.. code-block:: python
+
+  import mlx.core as mx
+  import tensorflow as tf
+
+  a = mx.arange(3)
+  b = tf.constant(memoryview(a))
+  c = mx.array(b)
--- a/docs/build/html/_sources/usage/quick_start.rst
+++ b/docs/build/html/_sources/usage/quick_start.rst
@@ -0,0 +1,64 @@
+Quick Start Guide
+=================
+
+
+Basics
+------
+
+.. currentmodule:: mlx.core
+
+Import ``mlx.core`` and make an :class:`array`:
+
+.. code-block:: python
+
+  >> import mlx.core as mx
+  >> a = mx.array([1, 2, 3, 4])
+  >> a.shape
+  [4]
+  >> a.dtype
+  int32
+  >> b = mx.array([1.0, 2.0, 3.0, 4.0])
+  >> b.dtype
+  float32
+
+Operations in MLX are lazy. The outputs of MLX operations are not computed
+until they are needed. To force an array to be evaluated use
+:func:`eval`.  Arrays will automatically be evaluated in a few cases. For
+example, inspecting a scalar with :meth:`array.item`, printing an array,
+or converting an array from :class:`array` to :class:`numpy.ndarray` all
+automatically evaluate the array.
+
+.. code-block:: python
+
+  >> c = a + b    # c not yet evaluated
+  >> mx.eval(c)  # evaluates c
+  >> c = a + b
+  >> print(c)     # Also evaluates c
+  array([2, 4, 6, 8], dtype=float32)
+  >> c = a + b
+  >> import numpy as np
+  >> np.array(c)   # Also evaluates c
+  array([2., 4., 6., 8.], dtype=float32)
+
+Function and Graph Transformations
+----------------------------------
+
+MLX has standard function transformations like :func:`grad` and :func:`vmap`.
+Transformations can be composed arbitrarily. For example
+``grad(vmap(grad(fn)))`` (or any other composition) is allowed.
+
+.. code-block:: python
+
+  >> x = mx.array(0.0)
+  >> mx.sin(x)
+  array(0, dtype=float32)
+  >> mx.grad(mx.sin)(x)
+  array(1, dtype=float32)
+  >> mx.grad(mx.grad(mx.sin))(x)
+  array(-0, dtype=float32)
+
+Other gradient transformations include :func:`vjp` for vector-Jacobian products
+and :func:`jvp` for Jacobian-vector products.
+
+Use :func:`value_and_grad` to efficiently compute both a function's output and
+gradient with respect to the function's input. 
--- a/docs/build/html/_sources/usage/unified_memory.rst
+++ b/docs/build/html/_sources/usage/unified_memory.rst
@@ -0,0 +1,78 @@
+.. _unified_memory:
+
+Unified Memory
+==============
+
+.. currentmodule:: mlx.core
+
+Apple silicon has a unified memory architecture. The CPU and GPU have direct
+access to the same memory pool. MLX is designed to take advantage of that.
+
+Concretely, when you make an array in MLX you don't have to specify its location:
+
+
+.. code-block:: python
+
+  a = mx.random.normal((100,))
+  b = mx.random.normal((100,))
+
+Both ``a`` and ``b`` live in unified memory.
+
+In MLX, rather than moving arrays to devices, you specify the device when you
+run the operation. Any device can perform any operation on ``a`` and ``b``
+without needing to move them from one memory location to another. For example: 
+
+.. code-block:: python
+
+  mx.add(a, b, stream=mx.cpu)
+  mx.add(a, b, stream=mx.gpu)
+
+In the above, both the CPU and the GPU will perform the same add
+operation. The operations can (and likely will) be run in parallel since
+there are no dependencies between them. See :ref:`using_streams` for more
+information the semantics of streams in MLX.
+
+In the above ``add`` example, there are no dependencies between operations, so
+there is no possibility for race conditions. If there are dependencies, the
+MLX scheduler will automatically manage them. For example:
+
+.. code-block:: python
+
+  c = mx.add(a, b, stream=mx.cpu)
+  d = mx.add(a, c, stream=mx.gpu)
+
+In the above case, the second ``add`` runs on the GPU but it depends on the
+output of the first ``add`` which is running on the CPU. MLX will
+automatically insert a dependency between the two streams so that the second
+``add`` only starts executing after the first is complete and ``c`` is
+available.
+
+A Simple Example
+~~~~~~~~~~~~~~~~
+
+Here is a more interesting (albeit slightly contrived example) of how unified
+memory can be helpful. Suppose we have the following computation:
+
+.. code-block:: python
+
+  def fun(a, b, d1, d2):
+    x = mx.matmul(a, b, stream=d1)
+    for _ in range(500):
+        b = mx.exp(b, stream=d2)
+    return x, b
+
+which we want to run with the following arguments:
+
+.. code-block:: python
+
+  a = mx.random.uniform(shape=(4096, 512))
+  b = mx.random.uniform(shape=(512, 4))
+
+The first ``matmul`` operation is a good fit for the GPU since it's more
+compute dense. The second sequence of operations are a better fit for the CPU,
+since they are very small and would probably be overhead bound on the GPU.
+
+If we time the computation fully on the GPU, we get 2.8 milliseconds. But if we
+run the computation with ``d1=mx.gpu`` and ``d2=mx.cpu``, then the time is only
+about 1.4 milliseconds, about twice as fast. These times were measured on an M1
+Max.
--- a/docs/build/html/_sources/usage/using_streams.rst
+++ b/docs/build/html/_sources/usage/using_streams.rst
@@ -0,0 +1,18 @@
+.. _using_streams:
+
+Using Streams
+=============
+
+.. currentmodule:: mlx.core
+
+Specifying the :obj:`Stream`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+All operations (including random number generation) take an optional
+keyword argument ``stream``. The ``stream`` kwarg specifies which
+:obj:`Stream` the operation should run on. If the stream is unspecified then
+the operation is run on the default stream of the default device:
+``mx.default_stream(mx.default_device())``.  The ``stream`` kwarg can also
+be a :obj:`Device` (e.g. ``stream=my_device``) in which case the operation is
+run on the default stream of the provided device
+``mx.default_stream(my_device)``.