Without the && args would be copied and perfect forwarding won't work.
Also add template utils to make sure the function only forwards array
and not vector<array>.
* add numeric type hierarchy and issubdtype as well as a set_dtype method to nn.Module with predicate
numeric type hierarchy and issubtype is compatible to the [numpy hierarchy](220f0ab2c5/numpy/_core/numerictypes.py (L42)).
Closes#285.
* nits in docs
* unify type category checking
* nits in docs
* nits in docs
* more docs nits
* fix callable type
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* some small overhead improvements
* use result_type in rms_norm
* remove release force
* fix + use non-vector version
* revert compile change
* fix ops
* a little more overhead
* a little more cleanup and overhead
1. Move shapes into outputs instead of copying them.
2. Pass primitive by const ref as it is always copied into outputs, which
removes a copy when calling make_array.
* fast rmsnorm
* no rms gpu
* kernel
* fix shared mem
* looped rms and donation in softmax
* Make the squaring in float32 to avoid underflow
* Fix the default StreamOrDevice for rope and rms_norm in fast
* nits
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
Without the && args would be copied and perfect forwarding won't work.
To avoid eval calling itself recursively, the vector version of eval is
changed to take by value instead, which will save a copy of array when a
rvalue is passed.
* Update mlx_set_item to handle regular slices without expanding
* Refactor ellipsis handling
* Route mlx_set_item to slice_update where possible
* Update mlx_scatter_args_slice
* Don't route to gather if no array indices
* Enable copy to work with int64 strides
* Fix uniform buffer indices or copy kernel arguments
* Update utils.h
* Remove manual unrolling of elem to loc loop
* GPU copy updated to handle negative strides
* Add slice update primitive
Since the shape and inputs are always saved as copy in ArrayDesc, we can
unify array's constructors to just take the arguments by value.
There are 2 cases:
1. When shape is a lvalue, it will be copied into array's constructor and
then moved into ArrayDesc's member. So only 1 copy happens.
2. When shape is a rvalue, it will be moved into array's constructor and
then moved into ArrayDesc's member. So no copy happens.
So having 1 constructor that takes by value is equivalent to having 2
constructors that const reference and rvalue separately.
* mostly builds
* most tests pass
* fix circle build
* add back buffer protocol
* includes
* fix for py38
* limit to cpu device
* include
* fix stubs
* move signatures for docs
* stubgen + docs fix
* doc for compiled function, comments
1. Anonymous namespace means internal linkage, static keyword is not needed.
2. The default constructor of std::shared_ptr initializes the pointer to
nullptr, you don't need to explicitly set it.
* Enable collapsing batch dims in gemm
* Update gemm to only make copies when neither of the last 2 axes are contiguous
* Update addmm to support gemv shapes
* Update addmm to support irregular batch strides
* Update tests