* try cpp 20 for compile
* unary, binary, ternary in jit
* nits
* fix gather/scatter
* fix rebase
* reorg compile
* add ternary to compile
* jit copy
* jit compile flag
* fix build
* use linked function for ternary
* some nits
* docs + circle min size build
* docs + circle min size build
* fix extension
* fix no cpu build
* improve includes
* mostly builds
* most tests pass
* fix circle build
* add back buffer protocol
* includes
* fix for py38
* limit to cpu device
* include
* fix stubs
* move signatures for docs
* stubgen + docs fix
* doc for compiled function, comments
* Replace argument encoder usage for gather and scatter
* Use constant address space for shapes and strides
* Split gather and scatter to improve compile times
* Enable the GPU tests
* Update the CI config
* Fix scatter dispatch for scalar indices
* Remove arg encoder utils
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
* CI update
* Skip large binary test for now
* Upgrade pip
* Add proper env variable skipping
* Update the CI
* Fix workflow name
* Set the low memory flag for the tests
* Change build process
* Add pip upgrade
* Use a venv
* Add a missing env activate
* Add setuptools
* Add twine upload back
* Re-enable automatic release builds
* propagate nans in binary ops
* handle empty matmul
* cpu minimum/maximum propagate nan
* benchmark maximum
* add min as well
* throw on negative indices with full
* verbose on linux
* fix matmul for zero K
* resolved extension build issues and added test to ci
* missing gguflib
* rebased
* force mlx install from fix branch
* linux build issue
* point to git install and comment out ci tests