* add implicit conversion of list to array for equality constraint
* add tests for array equality
* add test for tuple and array equality
* return False if __eq__ arg is list or tuple
* write tests for equality
* update the rule of comparison for __ge__/__gt__/__lt__/__le__
* add a helper function for detecting mlx.core.array
* return true in case fo inequality
* debug minor issue regarding detecting mlx array
* add tests for inequality comparisons
* add name for contribution
* reformat files using pre-commit
* update tests for float
* update tests for inequality
* raise exception in case of invalid comparisons
* use isinstance instead of string comparison
* replace "is_convirtable_to_array" with previous logic
* remove throwing exceptions for other operations
* just a comment
* minor changes for efficiency
* optimize a utils function
* change the function name
* Update ACKNOWLEDGMENTS.md
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
When compositing transforms lots of temporary of arrays will be created
and passed to next primitive, and by making ops accepting args by value
we can avoid lots of copies of temporary arrays.
The arm64 macbook pros are heavy and I usually care my intel one for
mobile, it would be nice if I can play with MLX on it.
To build with x64, user must pass `MLX_ENABLE_X64_MAC` to cmake:
CMAKE_ARGS='-DMLX_ENABLE_X64_MAC=ON' python setup.py
Without the && args would be copied and perfect forwarding won't work.
Also add template utils to make sure the function only forwards array
and not vector<array>.
* add numeric type hierarchy and issubdtype as well as a set_dtype method to nn.Module with predicate
numeric type hierarchy and issubtype is compatible to the [numpy hierarchy](220f0ab2c5/numpy/_core/numerictypes.py (L42)).
Closes#285.
* nits in docs
* unify type category checking
* nits in docs
* nits in docs
* more docs nits
* fix callable type
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* some small overhead improvements
* use result_type in rms_norm
* remove release force
* fix + use non-vector version
* revert compile change
* fix ops
* a little more overhead
* a little more cleanup and overhead
1. Move shapes into outputs instead of copying them.
2. Pass primitive by const ref as it is always copied into outputs, which
removes a copy when calling make_array.
* fast rmsnorm
* no rms gpu
* kernel
* fix shared mem
* looped rms and donation in softmax
* Make the squaring in float32 to avoid underflow
* Fix the default StreamOrDevice for rope and rms_norm in fast
* nits
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
Without the && args would be copied and perfect forwarding won't work.
To avoid eval calling itself recursively, the vector version of eval is
changed to take by value instead, which will save a copy of array when a
rvalue is passed.
* Update mlx_set_item to handle regular slices without expanding
* Refactor ellipsis handling
* Route mlx_set_item to slice_update where possible
* Update mlx_scatter_args_slice
* Don't route to gather if no array indices