CCYeh
b3825ac149
Add Masked Scatter ( #2663 )
...
Co-authored-by: Awni Hannun <awni@apple.com >
Co-authored-by: Angelos Katharopoulos <katharas@gmail.com >
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2025-11-19 14:53:32 -08:00
Awni Hannun
66519fb348
fix slice ( #2758 )
2025-11-13 11:30:02 -08:00
CCYeh
be9e2aebd6
Shapeless support for zeros/ones_like ( #2726 )
...
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.10) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_with_tests (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
* shapeless support for zeros/ones_like
* Improvements
* fix access after moved
2025-11-06 19:12:20 -08:00
AN Long
1ff2b713b6
Check isnan in maximum / minimum with CPU backend ( #2652 )
...
* Check isnan in maximum / minimum with CPU backend
* Add tests
* fix
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-11-03 08:51:14 -08:00
Awni Hannun
969924cc69
Fp8 conversion ( #2686 )
...
* add fp8 e4m3 converters
* add cuda
* default saturate to min/max
* fix for older OS
* fix no gpu/cpu
* fix saturate
* fix compile
2025-10-27 16:35:50 -07:00
Ronan Collobert
8f8af61a37
fix warnings showing up with -Wall ( #2692 )
2025-10-24 11:43:35 -07:00
Awni Hannun
70560b6bd5
Add mode parameter for quantization ( #2499 )
...
* add mode parameter for quantization
* mxfp4 quantize/dequantize + start of optional biases
* mxfp4 works
* speedup
* cpu mxfp4
* fix
* fix test tol
* fix
* refactor
* add quant mode enum
2025-08-28 06:45:26 -07:00
Abe Leininger
fce53b61d6
Fix reduce sum/prod overflow ( #2477 )
2025-08-12 00:05:33 -07:00
Cheng
8347575ba1
[CUDA] Implement Scan kernel ( #2347 )
...
* Contiguous scan
* Strided scan
* Enable tests
* Fix failing logaddexp test
* Use cexpf in Metal
2025-07-10 16:54:12 -07:00
jhavukainen
8b9a3f3cea
Align mlx::core::max op nan propagation with NumPy ( #2339 )
...
* Make max op NaN propagation rules align with numpy
* Adding benchmarks and testing for max op nanpropagation
* Pre-commit formatting
* Fix max complex64 nan propagation and add test
* Improve the cpp unittest
* Only check nans on non-integral types in simd_reduce_impl.
* Cleanup using namespace alias
* Add cpu Max nanpropagation. Fix a small fib in cpu max dispatch data types for int8/int16.
* Make the max nanpropagation test more meaningful for integer types
* Remove tuple unpacking syntax to comply with earlier python versions. Add cuda skip to nanpropagation tests, fix cuda implementation in a separate PR.
2025-07-09 11:26:27 -07:00
Cheng
79071bfba4
Fix out-of-bounds default value in logsumexp/softmax ( #2213 )
2025-05-21 07:25:16 -07:00
Aashiq Dheeraj
bb6565ef14
add fftshift and ifftshift fft helpers ( #2135 )
...
* add fftshift and ifftshift fft helpers
* address comments
* axes have to be iterable
* fix fp error in roll + add test
---------
Co-authored-by: Aashiq Dheeraj <aashiq@aashiq-mbp-m4.local >
2025-04-29 22:13:45 -07:00
Param Thakkar
600e87e03c
Added output_padding parameters in conv_transpose ( #2092 )
2025-04-23 09:26:33 -07:00
Param Thakkar
5f04c0f818
Fixed shift operations issue ( #2080 )
...
* Fixed shift operations issue
* Added tests and fixes
* Fixed loop syntax error
* Added tests for bool
* Fixed typo
2025-04-18 14:28:33 -07:00
Jesper Stemann Andersen
5f5770e3a2
Fix CPU sign for unsigned ints ( #2024 )
...
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2025-03-30 17:56:59 -07:00
Awni Hannun
516ded618b
Dynamic slicing ( #1741 )
...
* dynamic slice and slice update
* python bindings + tests + fix set item
* fix compile issue
* comment
* fix jit
2025-01-07 14:02:16 -08:00
Awni Hannun
e03f0372b1
More shape type ( #1705 )
...
* more shape type
* fix
2024-12-19 08:08:20 -08:00
Awni Hannun
4e1e9520e1
Flatten and unflatten ( #1692 )
...
* flatten and unflatten
* fix grad
* fix shape infer
* use squeeze + unsqueeze in get_item
2024-12-11 21:51:37 -08:00
Awni Hannun
f3dfa36a3a
Fix x86 tests ( #1691 )
...
* fix x86 tests
* comment
2024-12-11 07:47:18 -08:00
Awni Hannun
f76a49e555
ExpandDims primitive (#1687 )
...
* add squeeze primitive
* simplify squeeze, use in gather
* fix
* fix
* fix
* fix
* fix no cpu
* use squeeze in matmul and friends
* expand dims primitive
* comment
2024-12-10 16:39:07 -08:00
Awni Hannun
40c62c1321
Use int64 stride everywhere ( #1671 )
...
* use int64 stride everywhere
* fix ext
* fix ext
* more shape + cleanup
* one more
* few more
2024-12-09 11:09:02 -08:00
Cheng
d0f471cff7
Using math defines requires switch in MSVC ( #1665 )
...
* Using math defines requires switch in MSVC
* Fix more math macros
* Fix type
* Remove _MSC_VER guard for math defines
2024-12-08 08:16:28 -08:00
Nripesh Niketan
3bb5b4a302
Chore: Add default language in pre-commit and bump hooks ( #1652 )
2024-12-06 07:54:29 -08:00
Awni Hannun
dcca0d7477
contiguous op / prim ( #1612 )
2024-11-21 19:51:49 -08:00
Angelos Katharopoulos
9b12093739
Add the roll op ( #1455 )
2024-10-07 17:21:42 -07:00
Awni Hannun
95d04805b3
Fix complex power on Metal ( #1460 )
2024-10-06 19:58:30 -07:00
Awni Hannun
195b429d99
Put along axis + fixe for partition grad ( #1430 )
...
* put along axis, fixes for partition grad
* zeros for arg reduce
2024-09-23 10:03:38 -07:00
Awni Hannun
e7e59c6f05
Fix copying scalars by adding fill_gpu ( #1402 )
...
* fix copying scalars by adding fill_gpu
* Another copy scalar changed to fill
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2024-09-09 15:54:08 -07:00
Awni Hannun
7cca1727af
Fix slice data size ( #1394 )
...
* fix slice data size and add tests
* fix contiguous flag
* simplify stride and perform copy for non-contiguous arrays
* fix cpu
* comment
2024-09-04 19:10:43 -07:00
Jeethu Rao
bd47e1f066
Fix neon_fast_exp and add more softmax tests ( #1367 )
2024-08-27 23:42:42 -07:00
Angelos Katharopoulos
9d26441224
Fix contiguity check ( #1336 )
...
Co-authored-by: Alex Barron <abarron22@apple.com >
2024-08-19 16:05:06 -07:00
Awni Hannun
df964132fb
fix scatter + test ( #1202 )
...
* fix scatter + test
* fix test warnings
* fix metal validation
2024-06-11 14:35:12 -07:00
Awni Hannun
ea9090bbc4
Add view op ( #1179 )
...
* add view primitive
* nit
* fix view
2024-06-04 08:05:27 -07:00
Rifur13
9401507336
Add groups to 2-D convolutions ( #1129 )
...
* Added groups to 2-D convolutions. Only implemented for **some** specializations.
Also fixed 1D grouped convs with different kernel strides and added more tests.
* fix channels condition
2024-05-22 20:01:44 -07:00
Abe Leininger
79ef49b2c2
add mx.trace ( #1143 ) ( #1147 )
...
* working c++ trace implementation
* updated throw + added overloads
* added python binding for trace function
* pre-commit reformatting
* add trace to docs
* resolve comments
* remove to_stream call
2024-05-22 15:50:27 -07:00
Rifur13
c4a471c99d
Add groups to Conv1d ( #948 )
...
* Add conv1d grouped convs on CPU
* Add GPU support
* Parallelize inside metal kernel
* clenaup
* Update mlx/ops.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* New unfold kernel + remove unused code
* Remove copy and refactor
* Update vjp and reuse steel gemm
* Fixed groups on cpu
* Fix metal validation
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
2024-04-27 06:24:57 -07:00
Awni Hannun
86f495985b
Add bitwise ops ( #1037 )
...
* bitwise ops
* fix tests
2024-04-26 22:03:42 -07:00
Aneesh Shetty
d0dbfe0b97
Adds radians and degrees ( #1011 )
2024-04-22 11:17:49 -07:00
Abe Leininger
a1a31eed27
Add mx.meshgrid ( #961 )
2024-04-09 11:43:08 -07:00
Awni Hannun
42afe27e12
std and expm1 ( #973 )
...
* std and expm1
* actually add expm1
* fix linux
* fix vjp
* relax tol for linux test
* Add it to the compilable primitives
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2024-04-08 14:26:01 -07:00
Daniel Strobusch
479051ce1c
add numeric type hierarchy and issubdtype as well as a set_dtype meth… ( #427 )
...
* add numeric type hierarchy and issubdtype as well as a set_dtype method to nn.Module with predicate
numeric type hierarchy and issubtype is compatible to the [numpy hierarchy](220f0ab2c5/numpy/_core/numerictypes.py (L42) ).
Closes #285 .
* nits in docs
* unify type category checking
* nits in docs
* nits in docs
* more docs nits
* fix callable type
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-03-25 12:32:59 -07:00
Jagrit Digani
cec8661113
Add a SliceUpdate op and primitive ( #850 )
...
* Enable copy to work with int64 strides
* Fix uniform buffer indices or copy kernel arguments
* Update utils.h
* Remove manual unrolling of elem to loc loop
* GPU copy updated to handle negative strides
* Add slice update primitive
2024-03-20 10:39:25 -07:00
Angelos Katharopoulos
29d0c10ee5
Reshape improvement ( #818 )
2024-03-12 17:54:31 -07:00
Awni Hannun
8b7532b9ab
fix scatter ( #821 )
2024-03-12 11:42:07 -07:00
Awni Hannun
5121f028d9
nice tensordot for mlx c ( #782 )
2024-03-04 09:51:02 -08:00
Angelos Katharopoulos
8e281c76c3
Fix the top-k op ( #768 )
2024-03-01 22:08:43 -08:00
Hinrik Snær Guðmundsson
08226ab491
added atleast *args input support ( #710 )
...
* added atleast list(array) input support
* function overloading implemented
* Refactoring
* fixed formatting
* removed pos_only
2024-02-26 11:17:59 -08:00
Awni Hannun
e6418781ab
Fix logsumexp edge case ( #740 )
...
* fix logsumexp
* fix inf constant
* also fix power grad
* fix ternary dispatch
2024-02-25 08:39:55 -08:00
Rifur13
126c9869c8
Implement the 'where' primitive for conditional selection ( #664 )
2024-02-22 15:10:48 -08:00
Vijay Krish
972d9a3aea
Up to 10x faster scatter. ( #709 )
...
* Faster scatter.
Add specialization for 1-d index tensors.
* Address review comments.
- Check for row contiguity of index, update tensors
instead of checking strides.
- Add support for 1d specialization with col contiguous update
tensor, along with a test.
* Nit1
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Nit2
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
2024-02-21 11:09:30 -08:00