Commit Graph

153 Commits

Author SHA1 Message Date
Cheng
79071bfba4
Fix out-of-bounds default value in logsumexp/softmax (#2213) 2025-05-21 07:25:16 -07:00
Angelos Katharopoulos
cf6c939e86
Fix some complex vjps (#2178) 2025-05-14 23:37:12 -07:00
Cheng
0cae0bdac8
CUDA backend: backbone (#2075) 2025-05-06 21:26:46 -07:00
Awni Hannun
9c5e7da507
fix compile merging (#2150) 2025-05-02 15:08:50 -07:00
Cheng
ea890d8710
Remove metal-only tests (#2139) 2025-04-30 09:08:39 -07:00
Aashiq Dheeraj
bb6565ef14
add fftshift and ifftshift fft helpers (#2135)
* add fftshift and ifftshift fft helpers

* address comments

* axes have to be iterable

* fix fp error in roll + add test

---------

Co-authored-by: Aashiq Dheeraj <aashiq@aashiq-mbp-m4.local>
2025-04-29 22:13:45 -07:00
Param Thakkar
600e87e03c
Added output_padding parameters in conv_transpose (#2092) 2025-04-23 09:26:33 -07:00
Awni Hannun
dc4eada7f0
Use unordered map for kwargs in export/import (#2087)
* use unordered map for kwargs in export/import

* comment
2025-04-21 07:17:22 -07:00
Param Thakkar
5f04c0f818
Fixed shift operations issue (#2080)
* Fixed shift operations issue

* Added tests and fixes

* Fixed loop syntax error

* Added tests for bool

* Fixed typo
2025-04-18 14:28:33 -07:00
Cheng
ba09f01ce8
Remove test of converting negative float to uint (#2048) 2025-04-06 06:21:46 -07:00
Jesper Stemann Andersen
5f5770e3a2
Fix CPU sign for unsigned ints (#2024)
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2025-03-30 17:56:59 -07:00
Awni Hannun
5580b47291
iinfo and scalar overflow detection (#2009) 2025-03-27 19:54:56 -07:00
Awni Hannun
a6b5d6e759
revise cmake minimum for doctest (#2014) 2025-03-27 19:30:58 -07:00
Awni Hannun
4e1994e9d7
move memory APIs into top level mlx.core (#1982) 2025-03-21 07:25:12 -07:00
Awni Hannun
c4230747a1
redesign for faster cpu/gpu synch (#1869)
* redesign for faster cpu/gpu synch

* load + more async CPU

* use command encoder API and move more ops to use it

* make fence back-end generic + CPU only fence

* faster build

* fix async eval

* fixes + handle temporaries

* fix / improve cpu conv

* remove unused status, fix siblings

* fix extensions

* fix

* fix no cpu build

* format

* comments

* fix perf regression, remove unecessary abort

* fix events, task limit cpu

* fix waiting

* fix donation / temporaries in normalization
2025-03-06 19:23:38 -08:00
Abe Leininger
3835a428c5
Adds nuclear norm support (#1894)
* adjust norm unit test tolerance
2025-03-04 13:26:02 -08:00
Abe Leininger
a5ededf1c3
CPU LU factorization and linear solvers (#1451)
* linalg solve backend

* nits

* more nits + fix

* luf primitive and lu, solve, and solve_triangular backends

* changes / nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2025-02-10 12:32:24 -08:00
Jesper Stemann Andersen
f6c0499b8d
Resolved ambiguity in mlx::core::take_along_axis (#1822)
* Resolved ambiguity in mlx::core::take_along_axis

Detected by GCC 10 on riscv64-linux-gnu.

* Formatted

* Removed superfluous parentheses in random_tests.cpp
2025-02-04 06:06:17 -08:00
Jesper Stemann Andersen
2d8e667400
MinGW support (#1806)
* Changed /bin/bash to bash for generating compiling preamble

* Fix wrt jit_compiler mingw like msvc wrt. WEXITSTATUS

* Solved ambiguity wrt. bernoulli test shape

* Disabled distributed/ring on Windows

* Fixed jit_compiler command wrt. MinGW

* Extended jit_compiler patch wrt. WEXITSTATUS to FreeBSD
2025-02-01 12:40:06 -08:00
Awni Hannun
2235dee906
catch stream errors earlier to avoid aborts (#1801) 2025-01-27 14:05:43 -08:00
Awni Hannun
da8c885784
Simplify removes no-ops from the tape (#1759)
* simplify removes no-ops from the tape

* comment
2025-01-09 11:23:19 -08:00
Awni Hannun
516ded618b
Dynamic slicing (#1741)
* dynamic slice and slice update

* python bindings + tests + fix set item

* fix compile issue

* comment

* fix jit
2025-01-07 14:02:16 -08:00
Awni Hannun
ae69cb15e9
shapeless compile in docs and partially shapeless reshape (#1742) 2025-01-02 16:24:42 -08:00
Cheng
8ecdfb718b
Fix export.cpp compilation with MSVC (#1737) 2024-12-29 06:56:30 -08:00
Awni Hannun
4ba0c24a8f
Export / import functions to / from a file (#1642)
* export and import functions

* refactor + works for few primitives

* nit

* allow primitives with state

* nit

* nit

* simplify serialize / deserialize

* fix for constants

* python bindings

* maybe fix serialize failure case

* add example

* more primitives, training kind of works

* same result for python and c++

* some fixes

* fix export

* template it up

* some simplificatoin

* rebase

* allow kwargs and multiple functions

* exporter

* more primitives for exporting

* deal with endianness

* handle invalid stream

* add docstring
2024-12-24 11:19:13 -08:00
Awni Hannun
c3628eea49
Add mx.finfo and use it when making causal mask (#1726)
* finfo

* fixes

* docs
2024-12-19 14:52:41 -08:00
Awni Hannun
e03f0372b1
More shape type (#1705)
* more shape type

* fix
2024-12-19 08:08:20 -08:00
Awni Hannun
4e1e9520e1
Flatten and unflatten (#1692)
* flatten and unflatten

* fix grad

* fix shape infer

* use squeeze + unsqueeze in get_item
2024-12-11 21:51:37 -08:00
Awni Hannun
f3dfa36a3a
Fix x86 tests (#1691)
* fix x86 tests

* comment
2024-12-11 07:47:18 -08:00
Awni Hannun
f76a49e555
ExpandDims primitive (#1687)
* add squeeze primitive

* simplify squeeze, use in gather

* fix

* fix

* fix

* fix

* fix no cpu

* use squeeze in matmul and friends

* expand dims primitive

* comment
2024-12-10 16:39:07 -08:00
Awni Hannun
40c62c1321
Use int64 stride everywhere (#1671)
* use int64 stride everywhere

* fix ext

* fix ext

* more shape + cleanup

* one more

* few more
2024-12-09 11:09:02 -08:00
Cheng
d0f471cff7
Using math defines requires switch in MSVC (#1665)
* Using math defines requires switch in MSVC

* Fix more math macros

* Fix type

* Remove _MSC_VER guard for math defines
2024-12-08 08:16:28 -08:00
Cheng
6f316b8bf5
Use int64_t instead of ssize_t (#1673) 2024-12-07 20:10:44 -08:00
Cheng
7c10c93a1f
Convert filesystem path to std::string explicitly (#1672) 2024-12-07 20:10:06 -08:00
Awni Hannun
69a2991614
allow compiling lambdas in C++ (#1650)
* allow compiling lambdas in C++

* fix test

* more tests

* auto detect capture-less lambda
2024-12-06 13:13:21 -08:00
Nripesh Niketan
3bb5b4a302
Chore: Add default language in pre-commit and bump hooks (#1652) 2024-12-06 07:54:29 -08:00
Awni Hannun
e047fd977d
compile changes if stream changes (#1644) 2024-12-03 14:37:44 -08:00
Awni Hannun
dcca0d7477
contiguous op / prim (#1612) 2024-11-21 19:51:49 -08:00
Cocoa
0d5e7716ad
fix typo: accross -> across (#1609)
Signed-off-by: Cocoa <i@uwucocoa.moe>
2024-11-20 15:30:51 -08:00
Alex Barron
048fabdabd
Fix vmap constant output size (#1524)
* use inputs to determine output size

* remove noop vmap tests
2024-10-30 16:16:53 -07:00
Kashif Rasul
3ddc07e936
Eigenvalues and eigenvectors (#1334)
* initial eigvalsh

* add compute_vectors

* add compute_vectors_

* return a pair

* add eigh to return only eigenvectors

* fixed typo

* merge merge Eighvalsh and Eigh into a single primitive

* use the same primate with the flag

* fix primatives

* use MULTI

* fix eval_gpu

* fix decleration

* rename EighPrimitive to Eigh

* tests

* tests

* fix rebase and format

* cleanup lapack

* format

* add cblas.h

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-10-22 12:18:48 -07:00
Angelos Katharopoulos
9b12093739
Add the roll op (#1455) 2024-10-07 17:21:42 -07:00
Awni Hannun
95d04805b3
Fix complex power on Metal (#1460) 2024-10-06 19:58:30 -07:00
Awni Hannun
195b429d99
Put along axis + fixe for partition grad (#1430)
* put along axis, fixes for partition grad

* zeros for arg reduce
2024-09-23 10:03:38 -07:00
Nripesh Niketan
6af5ca35b2
feat: add cross_product (#1252)
* feat: add cross_product

* lint

* python binding

* refactor: Improve error message for cross_product function

* refactor: more close to numpy cross product

* refactor: improve error message for cross_product function

* finish

* fix acks

* allow old numpy

* doc

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-17 13:12:43 -07:00
Nripesh Niketan
669c27140d
Chore: add pre-commit hook for cmake (#1362)
* reset and lint

* format

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-16 12:53:01 -07:00
Awni Hannun
e7e59c6f05
Fix copying scalars by adding fill_gpu (#1402)
* fix copying scalars by adding fill_gpu

* Another copy scalar changed to fill

---------

Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-09-09 15:54:08 -07:00
Awni Hannun
7cca1727af
Fix slice data size (#1394)
* fix slice data size and add tests

* fix contiguous flag

* simplify stride and perform copy for non-contiguous arrays

* fix cpu

* comment
2024-09-04 19:10:43 -07:00
Jeethu Rao
bd47e1f066
Fix neon_fast_exp and add more softmax tests (#1367) 2024-08-27 23:42:42 -07:00
Aditya Dhulipala
e6b223df5f
Pinv (#875) 2024-08-27 23:06:12 -07:00