Set ccache size before building (#2570 )

typing: add type hints to mlx.core.array, linalg, distributed, and random (#2565 )
* Add type annotations to mlx methods * Missing list_or_scalar
2025-12-16 01:49:05 +08:00 · 2025-09-07 09:00:31 +09:00 · 2025-09-04 09:08:11 -07:00 · 2025-09-03 12:52:08 -07:00 · 2025-09-02 22:07:02 -07:00 · 2025-09-02 11:00:36 -07:00
11 changed files with 57 additions and 15 deletions
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -230,6 +230,9 @@ jobs:
            sudo mv ccache-4.11.3-linux-x86_64/ccache /usr/bin/ccache
            rm -rf ccache-4.11.3-linux-x86_64
            curl -LsSf https://astral.sh/uv/install.sh | sh
      - run:
          name: Set CCache size
          command: ccache --max-size 1G
      - run:
          name: Install Python package
          command: |
@@ -260,7 +263,6 @@ jobs:
          command: |
            ccache --show-stats
            ccache --zero-stats
            ccache --max-size 400MB
            ccache --cleanup
      - save_cache:
          key: cuda-<< parameters.image_date >>-{{ arch }}-{{ epoch }}
--- a/docs/src/usage/indexing.rst
+++ b/docs/src/usage/indexing.rst
@@ -107,8 +107,20 @@ same array:
  >>> a
  array([1, 2, 0], dtype=int32)
 Note that unlike NumPy, slicing an array creates a copy, not a view. So
 mutating it does not mutate the original array:
-Note, unlike NumPy, updates to the same location are nondeterministic:
+.. code-block:: shell
  >>> a = mx.array([1, 2, 3])
  >>> b = a[:]
  >>> b[2] = 0
  >>> b
  array([1, 2, 0], dtype=int32)
  >>> a
  array([1, 2, 3], dtype=int32)
 Also unlike NumPy, updates to the same location are nondeterministic:
 .. code-block:: shell
--- a/mlx/backend/metal/scaled_dot_product_attention.cpp
+++ b/mlx/backend/metal/scaled_dot_product_attention.cpp
@@ -394,7 +394,7 @@ void ScaledDotProductAttention::eval_gpu(
  // Define some copy functions to ensure the layout of the inputs is as
  // expected.
-  copies.reserve(3);
+  copies.reserve(inputs.size());
  auto copy_unless = [&copies, &s](
                         auto predicate, const array& arr) -> const array& {
    if (!predicate(arr)) {
--- a/python/mlx/_stub_patterns.txt
+++ b/python/mlx/_stub_patterns.txt
@@ -1,20 +1,34 @@
 mlx.core.__prefix__:
  from typing import Any, Callable, Dict, List, Optional, Sequence, Tuple, Union
  import sys
  if sys.version_info >= (3, 10):
    from typing import TypeAlias
  else:
    from typing_extensions import TypeAlias
 mlx.core.__suffix__:
  from typing import Union
  scalar: TypeAlias = Union[int, float, bool]
  list_or_scalar: TypeAlias = Union[scalar, list["list_or_scalar"]]
  bool_: Dtype = ...
 mlx.core.distributed.__prefix__:
-  from mlx.core import array, Dtype, Device, Stream
+  from mlx.core import array, Dtype, Device, Stream, scalar
  from mlx.core.distributed import Group
  from typing import Sequence, Optional, Union
 mlx.core.fast.__prefix__:
-  from mlx.core import array, Dtype, Device, Stream
+  from mlx.core import array, Dtype, Device, Stream, scalar
  from typing import Sequence, Optional, Union
 mlx.core.linalg.__prefix__:
-  from mlx.core import array, Dtype, Device, Stream
+  from mlx.core import array, Dtype, Device, Stream, scalar
  from typing import Sequence, Optional, Tuple, Union
 mlx.core.metal.__prefix__:
-  from mlx.core import array, Dtype, Device, Stream
+  from mlx.core import array, Dtype, Device, Stream, scalar
  from typing import Sequence, Optional, Union
 mlx.core.random.__prefix__:
-  from mlx.core import array, Dtype, Device, Stream
+  from mlx.core import array, Dtype, Device, Stream, scalar, float32, int32
  from typing import Sequence, Optional, Union
--- a/python/mlx/optimizers/optimizers.py
+++ b/python/mlx/optimizers/optimizers.py
@@ -556,7 +556,7 @@ class AdamW(Adam):
        eps (float, optional): The term :math:`\epsilon` added to the
          denominator to improve numerical stability. Default: ``1e-8``
        weight_decay (float, optional): The weight decay :math:`\lambda`.
-          Default: ``0``.
+          Default: ``0.01``.
        bias_correction (bool, optional): If set to ``True``, bias correction
          is applied. Default: ``False``
    """
--- a/python/src/array.cpp
+++ b/python/src/array.cpp
@@ -320,6 +320,7 @@ void init_array(nb::module_& m) {
      .def_prop_ro(
          "shape",
          [](const mx::array& a) { return nb::cast(a.shape()); },
          nb::sig("def shape(self) -> tuple[int, ...]"),
          R"pbdoc(
          The shape of the array as a Python tuple.
@@ -347,6 +348,7 @@ void init_array(nb::module_& m) {
      .def(
          "item",
          &to_scalar,
          nb::sig("def item(self) -> scalar"),
          R"pbdoc(
            Access the value of a scalar array.
@@ -356,6 +358,7 @@ void init_array(nb::module_& m) {
      .def(
          "tolist",
          &tolist,
          nb::sig("def tolist(self) -> list_or_scalar"),
          R"pbdoc(
            Convert the array to a Python :class:`list`.
--- a/python/src/linalg.cpp
+++ b/python/src/linalg.cpp
@@ -447,6 +447,8 @@ void init_linalg(nb::module_& parent_module) {
      "a"_a,
      nb::kw_only(),
      "stream"_a = nb::none(),
      nb::sig(
          "def eig(a: array, *, stream: Union[None, Stream, Device] = None) -> Tuple[array, array]"),
      R"pbdoc(
        Compute the eigenvalues and eigenvectors of a square matrix.
@@ -523,6 +525,8 @@ void init_linalg(nb::module_& parent_module) {
      "UPLO"_a = "L",
      nb::kw_only(),
      "stream"_a = nb::none(),
      nb::sig(
          "def eigh(a: array, UPLO: str = 'L', *, stream: Union[None, Stream, Device] = None) -> Tuple[array, array]"),
      R"pbdoc(
        Compute the eigenvalues and eigenvectors of a complex Hermitian or
        real symmetric matrix.
--- a/python/src/ops.cpp
+++ b/python/src/ops.cpp
@@ -4271,7 +4271,7 @@ void init_ops(nb::module_& m) {
      nb::kw_only(),
      "stream"_a = nb::none(),
      nb::sig(
-          "def dequantize(w: array, /, scales: array, biases: Optional[array] = = None, group_size: int = 64, bits: int = 4, mode: str = 'affine', *, stream: Union[None, Stream, Device] = None) -> array"),
+          "def dequantize(w: array, /, scales: array, biases: Optional[array] = None, group_size: int = 64, bits: int = 4, mode: str = 'affine', *, stream: Union[None, Stream, Device] = None) -> array"),
      R"pbdoc(
        Dequantize the matrix ``w`` using quantization parameters.
--- a/python/src/random.cpp
+++ b/python/src/random.cpp
@@ -171,7 +171,7 @@ void init_random(nb::module_& parent_module) {
      "key"_a = nb::none(),
      "stream"_a = nb::none(),
      nb::sig(
-          "def normal(shape: Sequence[int] = [], dtype: Optional[Dtype] = float32, loc: Optional[scalar, array] = None, scale: Optional[scalar, array] = None, key: Optional[array] = None, stream: Union[None, Stream, Device] = None) -> array"),
+          "def normal(shape: Sequence[int] = [], dtype: Optional[Dtype] = float32, loc: Union[scalar, array, None] = None, scale: Union[scalar, array, None] = None, key: Optional[array] = None, stream: Union[None, Stream, Device] = None) -> array"),
      R"pbdoc(
        Generate normally distributed random numbers.
--- a/python/tests/test_fast_sdpa.py
+++ b/python/tests/test_fast_sdpa.py
@@ -619,6 +619,17 @@ class TestSDPA(mlx_tests.MLXTestCase):
        out = mx.fast.scaled_dot_product_attention(q, k, v, scale=scale, mask=mask)
        self.assertTrue(mx.allclose(ref, out, atol=1e-4, rtol=1e-4))
    def test_sdpa_noncontiguous_inputs(self):
        mask = mx.ones(shape=(4, 1, 7, 7), dtype=mx.bool_)
        mx.random.seed(0)
        q = mx.random.normal(shape=(4, 7, 32, 64)).swapaxes(1, 2)
        k = mx.random.normal(shape=(4, 7, 8, 64)).swapaxes(1, 2)
        v = mx.random.normal(shape=(4, 7, 8, 64)).swapaxes(1, 2)
        out = mx.fast.scaled_dot_product_attention(q, k, v, scale=1.0, mask=mask)
        ref = mlx_ref_attn(q, k, v, scale=1.0, mask=mask)
        self.assertTrue(mx.allclose(ref, out, atol=1e-4, rtol=1e-4))
    def test_sdpa_promote_mask(self):
        mask = mx.array(2.0, mx.bfloat16)
        D = 64
--- a/setup.py
+++ b/setup.py
@@ -176,10 +176,6 @@ class GenerateStubs(Command):
        # Run again without recursive to specify output file name
        subprocess.run(["rm", f"{out_path}/mlx.pyi"])
        subprocess.run(stub_cmd + ["-o", f"{out_path}/__init__.pyi"])
        # mx.bool_ gets filtered by nanobind because of the trailing
        # underscore, add it manually:
        with open(f"{out_path}/__init__.pyi", "a") as fid:
            fid.write("\nbool_: Dtype = ...")
 class MLXBdistWheel(bdist_wheel):
Author	SHA1	Message	Date
Cheng	c1e3340b23	Set ccache size before building (#2570 )	2025-09-07 09:00:31 +09:00
XXXXRT666	8f163a367d	typing: add type hints to mlx.core.array, linalg, distributed, and random (#2565 ) * Add type annotations to mlx methods * Missing list_or_scalar	2025-09-04 09:08:11 -07:00
Manuel Villanueva	89a3df9014	Fixed several type annotations in the MLX stubs which degraded to Unknown/Any (#2560 ) * Added scalar to stubs to fix Unkown Type Hint ### Proposed changes Issue #2478 reports that several type annotations in the MLX stubs degrade to Unknown/Any in editors like VS Code with Pylance, due to missing imports (Union, Optional, Tuple) and an undefined scalar type alias. This PR updates the stub generation patterns to: • Add missing typing imports in mlx.core.__prefix__ so that Union, Optional, Tuple, etc. are always available. • Define and export scalar: TypeAlias = Union[int, float, bool] in mlx.core.__suffix__ so that functions typed with Union[scalar, array] resolve correctly instead of falling back to Any. • Update submodule stub prefixes (distributed, fast, linalg, metal, random) to import scalar alongside array, Device, and Stream, ensuring type checkers resolve the union consistently across modules. With these changes, functions like mlx.add now display rich type signatures such as: ``` def add( a: scalar \| array, b: scalar \| array, stream: Stream \| Device \| None = None ) -> array ``` instead of degrading to Any. ### Checklist • I have read the CONTRIBUTING document • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes • I have added tests that prove my fix is effective or that my feature works (n/a — stub generation only) • I have updated the necessary documentation (if needed) * add bool to patterns --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-09-03 12:52:08 -07:00
Krishi Saripalli	c5d2937aa5	chore: Update Docs With Slice Copy Example (#2559 ) * chore: updated docs with slice copy example * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-09-02 22:07:02 -07:00
Awni Hannun	b61a65e313	fix copies in sdpa (#2563 )	2025-09-02 11:00:36 -07:00
wrmsr	04cbb4191c	Fix dequantize python sig (#2562 )	2025-09-01 11:50:20 -07:00
Artur Antonov	c5460762e7	Fix AdamW weight_decay default value in docstring (#2557 )	2025-08-31 21:29:30 -07:00