mirror of
				https://github.com/ml-explore/mlx.git
				synced 2025-10-31 07:58:14 +08:00 
			
		
		
		
	Remove Hazard tracking with Fences (#1509)
* remove hazard tracking * with fence map * no hazard tracking with fences * nits * fix fence retain * cleanup * fix quantized rebase
This commit is contained in:
		| @@ -33,12 +33,12 @@ Let's start with a simple example: | ||||
|   # Compile the function | ||||
|   compiled_fun = mx.compile(fun) | ||||
|  | ||||
|   # Prints: array(2.36788, dtype=float32)  | ||||
|   # Prints: array(2.36788, dtype=float32) | ||||
|   print(compiled_fun(x, y)) | ||||
|  | ||||
| The output of both the regular function and the compiled function is the same | ||||
| up to numerical precision. | ||||
|     | ||||
|  | ||||
| The first time you call a compiled function, MLX will build the compute | ||||
| graph, optimize it, and generate and compile code. This can be relatively | ||||
| slow. However, MLX will cache compiled functions, so calling a compiled | ||||
| @@ -96,7 +96,7 @@ element-wise operations: | ||||
|  | ||||
| .. code-block:: python | ||||
|  | ||||
|   def gelu(x):   | ||||
|   def gelu(x): | ||||
|       return x * (1 + mx.erf(x / math.sqrt(2))) / 2 | ||||
|  | ||||
| If you use this function with small arrays, it will be overhead bound. If you | ||||
| @@ -136,13 +136,6 @@ Now make an array, and benchmark both functions: | ||||
| On an M1 Max the times are 15.5 and 3.1 milliseconds. The compiled ``gelu`` is | ||||
| five times faster. | ||||
|  | ||||
| .. note:: | ||||
|  | ||||
|   As of the latest MLX, CPU functions are not fully compiled. Compiling CPU | ||||
|   functions can still be helpful, but won't typically result in as large a | ||||
|   speedup as compiling operations that run on the GPU. | ||||
|  | ||||
|  | ||||
| Debugging | ||||
| --------- | ||||
|  | ||||
| @@ -287,7 +280,7 @@ to the function. In some cases this can be pretty inconvenient. Hence, | ||||
|   print(fun(mx.array(1.0))) | ||||
|  | ||||
|  | ||||
| Compiling Training Graphs  | ||||
| Compiling Training Graphs | ||||
| ------------------------- | ||||
|  | ||||
| This section will step through how to use :func:`compile` with a simple example | ||||
| @@ -297,7 +290,7 @@ full forward, backward, and update with :func:`compile`. | ||||
|  | ||||
| To start, here is the simple example without any compilation: | ||||
|  | ||||
| .. code-block:: python  | ||||
| .. code-block:: python | ||||
|  | ||||
|   import mlx.core as mx | ||||
|   import mlx.nn as nn | ||||
| @@ -330,7 +323,7 @@ To start, here is the simple example without any compilation: | ||||
| To compile the update we can put it all in a function and compile it with the | ||||
| appropriate input and output captures. Here's the same example but compiled: | ||||
|  | ||||
| .. code-block:: python  | ||||
| .. code-block:: python | ||||
|  | ||||
|   import mlx.core as mx | ||||
|   import mlx.nn as nn | ||||
| @@ -355,7 +348,7 @@ appropriate input and output captures. Here's the same example but compiled: | ||||
|  | ||||
|   # The state that will be captured as input and output | ||||
|   state = [model.state, optimizer.state] | ||||
|        | ||||
|  | ||||
|   @partial(mx.compile, inputs=state, outputs=state) | ||||
|   def step(x, y): | ||||
|       loss_and_grad_fn = nn.value_and_grad(model, loss_fn) | ||||
| @@ -410,7 +403,7 @@ Compiling transformed functions works just as expected: | ||||
|  | ||||
|    In order to compile as much as possible, a transformation of a compiled | ||||
|    function will not by default be compiled. To compile the transformed | ||||
|    function simply pass it through :func:`compile`.  | ||||
|    function simply pass it through :func:`compile`. | ||||
|  | ||||
| You can also compile functions which themselves call compiled functions. A | ||||
| good practice is to compile the outer most function to give :func:`compile` | ||||
|   | ||||
| @@ -25,7 +25,7 @@ Here is a simple example: | ||||
|  | ||||
| The output of :func:`grad` on :func:`sin` is simply another function. In this | ||||
| case it is the gradient of the sine function which is exactly the cosine | ||||
| function. To get the second derivative you can do:  | ||||
| function. To get the second derivative you can do: | ||||
|  | ||||
| .. code-block:: shell | ||||
|  | ||||
| @@ -50,7 +50,7 @@ Automatic Differentiation | ||||
| .. _auto diff: | ||||
|  | ||||
| Automatic differentiation in MLX works on functions rather than on implicit | ||||
| graphs.  | ||||
| graphs. | ||||
|  | ||||
| .. note:: | ||||
|  | ||||
| @@ -114,7 +114,7 @@ way to do that is the following: | ||||
|  | ||||
|    def loss_fn(params, x, y): | ||||
|       w, b = params["weight"], params["bias"] | ||||
|       h = w * x + b  | ||||
|       h = w * x + b | ||||
|       return mx.mean(mx.square(h - y)) | ||||
|  | ||||
|    params = {"weight": mx.array(1.0), "bias": mx.array(0.0)} | ||||
| @@ -132,7 +132,7 @@ way to do that is the following: | ||||
|  | ||||
| Notice the tree structure of the parameters is preserved in the gradients. | ||||
|  | ||||
| In some cases you may want to stop gradients from propagating through a  | ||||
| In some cases you may want to stop gradients from propagating through a | ||||
| part of the function. You can use the :func:`stop_gradient` for that. | ||||
|  | ||||
|  | ||||
| @@ -166,14 +166,14 @@ A naive way to add the elements from two sets of vectors is with a loop: | ||||
| Instead you can use :func:`vmap` to automatically vectorize the addition: | ||||
|  | ||||
| .. code-block:: python | ||||
|     | ||||
|  | ||||
|    # Vectorize over the second dimension of x and the | ||||
|    # first dimension of y | ||||
|    vmap_add = mx.vmap(lambda x, y: x + y, in_axes=(1, 0)) | ||||
|  | ||||
| The ``in_axes`` parameter can be used to specify which dimensions of the | ||||
| corresponding input to vectorize over. Similarly, use ``out_axes`` to specify | ||||
| where the vectorized axes should be in the outputs.  | ||||
| where the vectorized axes should be in the outputs. | ||||
|  | ||||
| Let's time these two different versions: | ||||
|  | ||||
|   | ||||
| @@ -51,7 +51,7 @@ You can also use an :obj:`array` to index another :obj:`array`: | ||||
| .. code-block:: shell | ||||
|  | ||||
|   >>> arr = mx.arange(10) | ||||
|   >>> idx = mx.array([5, 7])  | ||||
|   >>> idx = mx.array([5, 7]) | ||||
|   >>> arr[idx] | ||||
|   array([5, 7], dtype=int32) | ||||
|  | ||||
| @@ -82,7 +82,7 @@ general, MLX has limited support for operations for which outputs | ||||
| operations which MLX does not yet support include :func:`numpy.nonzero` and the | ||||
| single input version of :func:`numpy.where`. | ||||
|  | ||||
| In Place Updates  | ||||
| In Place Updates | ||||
| ---------------- | ||||
|  | ||||
| In place updates to indexed arrays are possible in MLX. For example: | ||||
|   | ||||
| @@ -13,7 +13,7 @@ compute graph is recorded. The actual computation only happens if an | ||||
| :func:`eval` is performed. | ||||
|  | ||||
| MLX uses lazy evaluation because it has some nice features, some of which we | ||||
| describe below.  | ||||
| describe below. | ||||
|  | ||||
| Transforming Compute Graphs | ||||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||
| @@ -116,7 +116,7 @@ saving functions) will also evaluate the array. | ||||
|  | ||||
| Calling :func:`array.item` on a scalar array will also evaluate it. In the | ||||
| example above, printing the loss (``print(loss)``) or adding the loss scalar to | ||||
| a list (``losses.append(loss.item())``) would cause a graph evaluation. If  | ||||
| a list (``losses.append(loss.item())``) would cause a graph evaluation. If | ||||
| these lines are before ``mx.eval(loss, model.parameters())`` then this | ||||
| will be a partial evaluation, computing only the forward pass. | ||||
|  | ||||
|   | ||||
| @@ -3,10 +3,10 @@ | ||||
| Conversion to NumPy and Other Frameworks | ||||
| ======================================== | ||||
|  | ||||
| MLX array supports conversion between other frameworks with either:   | ||||
| MLX array supports conversion between other frameworks with either: | ||||
|  | ||||
| * The `Python Buffer Protocol <https://docs.python.org/3/c-api/buffer.html>`_.  | ||||
| * `DLPack <https://dmlc.github.io/dlpack/latest/>`_.   | ||||
| * The `Python Buffer Protocol <https://docs.python.org/3/c-api/buffer.html>`_. | ||||
| * `DLPack <https://dmlc.github.io/dlpack/latest/>`_. | ||||
|  | ||||
| Let's convert an array to NumPy and back. | ||||
|  | ||||
| @@ -66,7 +66,7 @@ even though no in-place operations on MLX memory are executed. | ||||
| PyTorch | ||||
| ------- | ||||
|  | ||||
| .. warning::  | ||||
| .. warning:: | ||||
|  | ||||
|    PyTorch Support for :obj:`memoryview` is experimental and can break for | ||||
|    multi-dimensional arrays. Casting to NumPy first is advised for now. | ||||
|   | ||||
| @@ -64,4 +64,4 @@ Other gradient transformations include :func:`vjp` for vector-Jacobian products | ||||
| and :func:`jvp` for Jacobian-vector products. | ||||
|  | ||||
| Use :func:`value_and_grad` to efficiently compute both a function's output and | ||||
| gradient with respect to the function's input.  | ||||
| gradient with respect to the function's input. | ||||
|   | ||||
| @@ -8,33 +8,33 @@ Saving and Loading Arrays | ||||
| MLX supports multiple array serialization formats. | ||||
|  | ||||
| .. list-table:: Serialization Formats | ||||
|    :widths: 20 8 25 25  | ||||
|    :widths: 20 8 25 25 | ||||
|    :header-rows: 1 | ||||
|  | ||||
|    * - Format  | ||||
|      - Extension  | ||||
|    * - Format | ||||
|      - Extension | ||||
|      - Function | ||||
|      - Notes  | ||||
|    * - NumPy  | ||||
|      - ``.npy``  | ||||
|      - Notes | ||||
|    * - NumPy | ||||
|      - ``.npy`` | ||||
|      - :func:`save` | ||||
|      - Single arrays only | ||||
|    * - NumPy archive  | ||||
|      - ``.npz``  | ||||
|    * - NumPy archive | ||||
|      - ``.npz`` | ||||
|      - :func:`savez` and :func:`savez_compressed` | ||||
|      - Multiple arrays  | ||||
|      - Multiple arrays | ||||
|    * - Safetensors | ||||
|      - ``.safetensors``  | ||||
|      - ``.safetensors`` | ||||
|      - :func:`save_safetensors` | ||||
|      - Multiple arrays  | ||||
|    * - GGUF  | ||||
|      - ``.gguf``  | ||||
|      - Multiple arrays | ||||
|    * - GGUF | ||||
|      - ``.gguf`` | ||||
|      - :func:`save_gguf` | ||||
|      - Multiple arrays | ||||
|  | ||||
| The :func:`load` function will load any of the supported serialization | ||||
| formats. It determines the format from the extensions. The output of | ||||
| :func:`load` depends on the format.  | ||||
| :func:`load` depends on the format. | ||||
|  | ||||
| Here's an example of saving a single array to a file: | ||||
|  | ||||
|   | ||||
| @@ -20,7 +20,7 @@ Both ``a`` and ``b`` live in unified memory. | ||||
|  | ||||
| In MLX, rather than moving arrays to devices, you specify the device when you | ||||
| run the operation. Any device can perform any operation on ``a`` and ``b`` | ||||
| without needing to move them from one memory location to another. For example:  | ||||
| without needing to move them from one memory location to another. For example: | ||||
|  | ||||
| .. code-block:: python | ||||
|  | ||||
|   | ||||
		Reference in New Issue
	
	Block a user
	 Awni Hannun
					Awni Hannun